linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-01-14	kmemcg: account certain kmem allocations to memcg	Vladimir Davydov	52	-68/+82
	Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	vmalloc: allow to account vmalloc to memcg	Vladimir Davydov	1	-3/+3
	Make vmalloc family functions allocate vmalloc area pages with alloc_kmem_pages so that if __GFP_ACCOUNT is set they will be accounted to memcg. This is needed, at least, to account alloc_fdmem allocations. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	slab: add SLAB_ACCOUNT flag	Vladimir Davydov	6	-12/+26
	Currently, if we want to account all objects of a particular kmem cache, we have to pass __GFP_ACCOUNT to each kmem_cache_alloc call, which is inconvenient. This patch introduces SLAB_ACCOUNT flag which if passed to kmem_cache_create will force accounting for every allocation from this cache even if __GFP_ACCOUNT is not passed. This patch does not make any of the existing caches use this flag - it will be done later in the series. Note, a cache with SLAB_ACCOUNT cannot be merged with a cache w/o SLAB_ACCOUNT, because merged caches share the same kmem_cache struct and hence cannot have different sets of SLAB_* flags. Thus using this flag will probably reduce the number of merged slabs even if kmem accounting is not used (only compiled in). Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	memcg: only account kmem allocations marked as __GFP_ACCOUNT	Vladimir Davydov	3	-1/+13
	Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be fragile and difficult to maintain, because there seem to be many more allocations that should not be accounted than those that should be. Besides, false accounting an allocation might result in much worse consequences than not accounting at all, namely increased memory consumption due to pinned dead kmem caches. So this patch switches kmem accounting to the white-policy: now only those kmem allocations that are marked as __GFP_ACCOUNT are accounted to memcg. Currently, no kmem allocations are marked like this. The following patches will mark several kmem allocations that are known to be easily triggered from userspace and therefore should be accounted to memcg. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	Revert "gfp: add __GFP_NOACCOUNT"	Vladimir Davydov	3	-6/+1
	This reverts commit 8f4fc071b192 ("gfp: add __GFP_NOACCOUNT"). Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be fragile and difficult to maintain, because there seem to be many more allocations that should not be accounted than those that should be. Besides, false accounting an allocation might result in much worse consequences than not accounting at all, namely increased memory consumption due to pinned dead kmem caches. So it was decided to switch to the white-list policy. This patch reverts bits introducing the black-list policy. The white-list policy will be introduced later in the series. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	Revert "kernfs: do not account ino_ida allocations to memcg"	Vladimir Davydov	1	-8/+1
	Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc, alloc_kmem_pages call) are accounted to memory cgroup automatically. Callers have to explicitly opt out if they don't want/need accounting for some reason. Such a design decision leads to several problems: - kmalloc users are highly sensitive to failures, many of them implicitly rely on the fact that kmalloc never fails, while memcg makes failures quite plausible. - A lot of objects are shared among different containers by design. Accounting such objects to one of containers is just unfair. Moreover, it might lead to pinning a dead memcg along with its kmem caches, which aren't tiny, which might result in noticeable increase in memory consumption for no apparent reason in the long run. - There are tons of short-lived objects. Accounting them to memcg will only result in slight noise and won't change the overall picture, but we still have to pay accounting overhead. For more info, see - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza Therefore this patchset switches to the white list policy. Now kmalloc users have to explicitly opt in by passing __GFP_ACCOUNT flag. Currently, the list of accounted objects is quite limited and only includes those allocations that (1) are known to be easily triggered from userspace and (2) can fail gracefully (for the full list see patch no. 6) and it still misses many object types. However, accounting only those objects should be a satisfactory approximation of the behavior we used to have for most sane workloads. This patch (of 6): Revert 499611ed451508a42d1d7d ("kernfs: do not account ino_ida allocations to memcg"). Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be fragile and difficult to maintain, because there seem to be many more allocations that should not be accounted than those that should be. Besides, false accounting an allocation might result in much worse consequences than not accounting at all, namely increased memory consumption due to pinned dead kmem caches. So it was decided to switch to the white-list policy. This patch reverts bits introducing the black-list policy. The white-list policy will be introduced later in the series. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	mm/slab.c: add a helper function get_first_slab	Geliang Tang	1	-18/+21
	Add a new helper function get_first_slab() that get the first slab from a kmem_cache_node. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	mm/slab.c: use list_for_each_entry in cache_flusharray	Geliang Tang	1	-7/+2
	Simplify the code with list_for_each_entry(). Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	mm/slab.c use list_first_entry_or_null()	Geliang Tang	1	-12/+12
	Simplify the code with list_first_entry_or_null(). Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	include/linux/dcache.h: remove semicolons from HASH_LEN_DECLARE	Andrew Morton	1	-2/+2
	A little cleanup - the invocation site provdes the semicolon. Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: cleanup redunant lksb flags in dlmcommon.h	Joseph Qi	1	-11/+0
	lksb flags are defined both in dlmapi.h and dlmcommon.h. So clean them up from dlmcommon.h. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: dlm: remove redundant code	Junxiao Bi	1	-5/+1
	Found this when do patch review, remove to make it clear and save a little cpu time. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del	Joseph Qi	1	-9/+9
	In ocfs2_orphan_del, currently it finds and deletes entry first, and then access orphan dir dinode. This will have a problem once ocfs2_journal_access_di fails. In this case, entry will be removed from orphan dir, but in deed the inode hasn't been deleted successfully. In other words, the file is missing but not actually deleted. So we should access orphan dinode first like unlink and rename. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: do not insert a new mle when another process is already migrating	xuejiufei	1	-2/+3
	When two processes are migrating the same lockres, dlm_add_migration_mle() return -EEXIST, but insert a new mle in hash list. dlm_migrate_lockres() will detach the old mle and free the new one which is already in hash list, that will destroy the list. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: ignore cleaning the migration mle that is inuse	xuejiufei	1	-11/+15
	We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: do not lock/unlock() inode DLM lock	Goldwyn Rodrigues	1	-8/+0
	DLM does not cache locks. So, blocking lock and unlock will only make the performance worse where contention over the locks is high. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: fix slot overwritten if storage link down during mount	jiangyiwen	1	-1/+10
	The following case will lead to slot overwritten. N1 N2 mount ocfs2 volume, find and allocate slot 0, then set osb->slot_num to 0, begin to write slot info to disk mount ocfs2 volume, wait for super lock write block fail because of storage link down, unlock super lock got super lock and also allocate slot 0 then unlock super lock mount fail and then dismount, since osb->slot_num is 0, try to put invalid slot to disk. And it will succeed if storage link restores. N2 slot info is now overwritten Once another node say N3 mount, it will find and allocate slot 0 again, which will lead to mount hung because journal has already been locked by N2. so when write slot info failed, invalidate slot in advance to avoid overwrite slot. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: return appropriate value when dlm_grab() returns NULL	Xue jiufei	2	-2/+2
	dlm_grab() may return NULL when the node is doing unmount. When doing code review, we found that some dlm handlers may return error to caller when dlm_grab() returns NULL and make caller BUG or other problems. Here is an example: Node 1 Node 2 receives migration message from node 3, and send migrate request to others start unmounting receives migrate request from node 1 and call dlm_migrate_request_handler() unmount thread unregisters domain handlers and removes dlm_context from dlm_domains dlm_migrate_request_handlers() returns -EINVAL to node 1 Exit migration neither clearing the migration state nor sending assert master message to node 3 which cause node 3 hung. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: clean up redundant NULL check before iput	Joseph Qi	7	-25/+11
	Since iput will take care the NULL check itself, NULL check before calling it is redundant. So clean them up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker	jiangyiwen	1	-1/+1
	Commit f3f854648de6 ("ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres") still exists a race which can't ensure the ordering is exactly correct. Node1 Node2 Node3 umount, migrate lockres to Node2 migrate finished, send migrate request to Node3 received migrate request, create a migration_mle, respond to Node2. set DLM_LOCK_RES_SETREF_INPROG and send assert master to Node3 delete migration_mle in assert_master_handler, Node3 umount without response dlm_thread purge this lockres, send drop deref message to Node2 found the flag of DLM_LOCK_RES_SETREF_INPROG is set, dispatch dlm_deref_lockres_worker to clear refmap, but in function of dlm_deref_lockres_worker, only if node in refmap it wait DLM_LOCK_RES_SETREF_INPROG to be cleared. So worker is done successfully purge lockres, send assert master response to Node1, and finish umount set Node3 in refmap, and it won't be cleared forever, thus lead to umount hung so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: constify ocfs2_extent_tree_operations structures	Julia Lawall	2	-7/+7
	The ocfs2_extent_tree_operations structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2/dlm: fix a race between purge and migration	Xue jiufei	1	-1/+8
	We found a race between purge and migration when doing code review. Node A put lockres to purgelist before receiving the migrate message from node B which is the master. Node A call dlm_mig_lockres_handler to handle this message. dlm_mig_lockres_handler dlm_lookup_lockres >>>>>> race window, dlm_run_purge_list may run and send deref message to master, waiting the response spin_lock(&res->spinlock); res->state \|= DLM_LOCK_RES_MIGRATING; spin_unlock(&res->spinlock); dlm_mig_lockres_handler returns >>>>>> dlm_thread receives the response from master for the deref message and triggers the BUG because the lockres has the state DLM_LOCK_RES_MIGRATING with the following message: dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: res M0000000000000000030c0300000000 in use after deref Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: o2hb: increase unsteady iterations	Junxiao Bi	1	-2/+2
	When run multiple xattr test of ocfs2-test on a three-nodes cluster, mount failed sometimes with the following message. o2hb: Unable to stabilize heartbeart on region D18B775E758D4D80837E8CF3D086AD4A (xvdb) Stabilize heartbeat depends on the timing order to mount ocfs2 from cluster nodes and how fast the tcp connections are established. So increase unsteady interations to leave more time for it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: return non-zero st_blocks for inline data	John Haxby	1	-0/+8
	Some versions of tar assume that files with st_blocks == 0 do not contain any data and will skip reading them entirely. See also commit 9206c561554c ("ext4: return non-zero st_blocks for inline data"). Signed-off-by: John Haxby <john.haxby@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Gang He <ghe@suse.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	ocfs2: optimize bad declarations and redundant assignment	Norton.Zhu	1	-6/+2
	In ocfs2_parse_options, a) it's better to declare variables(small size) outside of while loop; b) 'option' will be set by match_int, 'option = 0;' makes no sense, if match_int failed, it just goto bail and return. Signed-off-by: Norton.Zhu <norton.zhu@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	logfs: fix logfs build errors and dependencies	Arnd Bergmann	1	-1/+1
	Fix build errors that happen when CONFIG_LOGFS=y and CONFIG_MTD=m: fs/built-in.o: In function `logfs_mount': super.c:(.text+0x92a6f): undefined reference to `logfs_get_sb_mtd' fs/built-in.o: In function `logfs_get_sb_bdev': (.text+0x93530): undefined reference to `logfs_get_sb_mtd' This patch avoids the error by changing the dependencies of logfs in a way that we can no longer configure logfs as built-in when the MTD core is a loadable module, while leaving the dependency to require at least one of MTD or BLOCK to be enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Peter Chen <peter.chen@freescale.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	modpost: don't add a trailing wildcard for OF module aliases	Javier Martinez Canillas	1	-2/+1
	Commit ac551828993e ("modpost: i2c aliases need no trailing wildcard") removed the wildcard at the end of the I2C module aliases because I2C devices have no IDs so the aliases are just arbitrary device names. This is also true for OF modaliases since a compatible string is used to define a specific IP hardware block. So the modalias should match a specific compatible string and not attempt to match a compatible string whose name matches the beginning of another one. For example, the following driver module: $ modinfo cros_ec_keyb \| grep alias alias: platform:cros-ec-keyb alias: of:NTCgoogle,cros-ec-keyb* will be tried to be loaded for an alias of:NTCgoogle,cros-ec-keyb-v2 but there could be a different driver that supports the device for that compatible string so it's better to remove the trailing wildcard for OF. Also, remove the word "always" from the add_wildcard() function comment since that was carried from the time where a wildcard was always added at the end of the module alias for all the devices. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Suggested-by: Brian Norris <computersforpeace@gmail.com> Reviewed-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	fsnotify: destroy marks with call_srcu instead of dedicated thread	Jeff Layton	2	-53/+18
	At the time that this code was originally written, call_srcu didn't exist, so this thread was required to ensure that we waited for that SRCU grace period to settle before finally freeing the object. It does exist now however and we can much more efficiently use call_srcu to handle this. That also allows us to potentially use srcu_barrier to ensure that they are all of the callbacks have run before proceeding. In order to conserve space, we union the rcu_head with the g_list. This will be necessary for nfsd which will allocate marks from a dedicated slabcache. We have to be able to ensure that all of the objects are destroyed before destroying the cache. That's fairly Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Jan Kara <jack@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	fs/notify/inode_mark.c: use list_next_entry in fsnotify_unmount_inodes	Geliang Tang	1	-2/+1
	To make the intention clearer, use list_next_entry instead of list_entry. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	scripts/bloat-o-meter: fix python3 syntax error	Sergey Senozhatsky	1	-4/+4
	In Python3+ print is a function so the old syntax is not correct anymore: $ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old File "./scripts/bloat-o-meter", line 61 print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \ ^ SyntaxError: invalid syntax Fix by calling print as a function. Tested on python 2.7.11, 3.5.1 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	dma-debug: switch check from _text to _stext	Laura Abbott	1	-1/+1
	In include/asm-generic/sections.h: /* * Usage guidelines: * _text, _data: architecture specific, don't use them in * arch-independent code * [_stext, _etext]: contains .text.* sections, may also contain * .rodata.* * and/or .init.* sections _text is not guaranteed across architectures. Architectures such as ARM may reuse parts which are not actually text and erroneously trigger a bug. Switch to using _stext which is guaranteed to contain text sections. Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com> Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14	m32r: fix m32104ut_defconfig build fail	Sudip Mukherjee	1	-0/+3
	The build of m32104ut_defconfig for m32r arch was failing for long long time with the error: ERROR: "memory_start" [fs/udf/udf.ko] undefined! ERROR: "memory_end" [fs/udf/udf.ko] undefined! ERROR: "memory_end" [drivers/scsi/sg.ko] undefined! ERROR: "memory_start" [drivers/scsi/sg.ko] undefined! ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined! ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined! As done in other architectures export the symbols to fix the error. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-13	efi: include asm/early_ioremap.h not asm/efi.h to get early_memremap	Ard Biesheuvel	1	-1/+1
	The code in efi.c uses early_memremap(), but relies on a transitive include rather than including asm/early_ioremap.h directly, since this header did not exist on ia64. Commit f7d924894265 ("arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM") attempted to work around this by including asm/efi.h, which transitively includes asm/early_ioremap.h on most architectures. However, since asm/efi.h does not exist on ia64 either, this is not much of an improvement. Now that we have created an asm/early_ioremap.h for ia64, we can just include it directly. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2016-01-13	ia64: split off early_ioremap() declarations into asm/early_ioremap.h	Ard Biesheuvel	2	-4/+11
	Unlike x86, arm64 and ARM, ia64 does not declare its implementations of early_ioremap/early_iounmap/early_memremap/early_memunmap in a header file called <asm/early_ioremap.h> This complicates the use of these functions in generic code, since the header cannot be included directly, and we have to rely on transitive includes, which is fragile. So create a <asm/early_ioremap.h> for ia64, and move the existing definitions into it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2016-01-12	net: bnxt: always return values from _bnxt_get_max_rings	Arnd Bergmann	1	-7/+9
	Newly added code in the bnxt driver uses a couple of variables that are never initialized when CONFIG_BNXT_SRIOV is not set, and gcc correctly warns about that: In file included from include/linux/list.h:8:0, from include/linux/module.h:9, from drivers/net/ethernet/broadcom/bnxt/bnxt.c:10: drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_get_max_rings': include/linux/kernel.h:794:26: warning: 'cp' may be used uninitialized in this function [-Wmaybe-uninitialized] include/linux/kernel.h:794:26: warning: 'tx' may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/ethernet/broadcom/bnxt/bnxt.c:5730:11: warning: 'rx' may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/ethernet/broadcom/bnxt/bnxt.c:5736:6: note: 'rx' was declared here This changes the condition so that we fall back to using the PF data if VF is not available, and always initialize the variables to something useful. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 6e6c5a57fbe1 ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.") Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	net: bpf: reject invalid shifts	Rabin Vincent	2	-0/+15
	On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a constant shift that can't be encoded in the immediate field of the UBFM/SBFM instructions is passed to the JIT. Since these shifts amounts, which are negative or >= regsize, are invalid, reject them in the eBPF verifier and the classic BPF filter checker, for all architectures. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	ipmi: Remove unnecessary pci_disable_device.	Dave Jones	1	-1/+0
	We call cleanup_one_si from ipmi_pci_remove, which calls ->addr_source_cleanup, which gets set to point to ipmi_pci_cleanup, which does a pci_disable_device. On return from this, we do a second pci_disable_device, which results in the trace below. ipmi_si 0000:00:16.0: disabling already-disabled device Call Trace: [<ffffffff818ce54c>] dump_stack+0x45/0x57 [<ffffffff810525f7>] warn_slowpath_common+0x97/0xe0 [<ffffffff810526f6>] warn_slowpath_fmt+0x46/0x50 [<ffffffff81497ca1>] pci_disable_device+0xb1/0xc0 [<ffffffffa00851a5>] ipmi_pci_remove+0x25/0x30 [ipmi_si] [<ffffffff8149a696>] pci_device_remove+0x46/0xc0 [<ffffffff8156801f>] __device_release_driver+0x7f/0xf0 [<ffffffff81568978>] driver_detach+0xb8/0xc0 [<ffffffff81567e50>] bus_remove_driver+0x50/0xa0 [<ffffffff8156914e>] driver_unregister+0x2e/0x60 [<ffffffff8149a3e5>] pci_unregister_driver+0x25/0x90 [<ffffffffa0085804>] cleanup_ipmi_si+0xd4/0xf0 [ipmi_si] [<ffffffff810c727a>] SyS_delete_module+0x12a/0x200 [<ffffffff818d4d72>] system_call_fastpath+0x12/0x17 Signed-off-by: Dave Jones <dsj@fb.com>
2016-01-12	char: ipmi: Drop owner assignment from i2c_driver	Krzysztof Kozlowski	1	-1/+0
	i2c_driver does not need to set an owner because i2c_register_driver() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-01-12	ipmi: constify some struct and char arrays	LABBE Corentin	2	-14/+20
	Lots of char arrays could be set as const since they contain only literal char arrays. We could in the same time make const some struct members who are pointer to those const char arrays. Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2016-01-12	phonet: properly unshare skbs in phonet_rcv()	Eric Dumazet	1	-0/+4
	Ivaylo Dimitrov reported a regression caused by commit 7866a621043f ("dev: add per net_device packet type chains"). skb->dev becomes NULL and we crash in __netif_receive_skb_core(). Before above commit, different kind of bugs or corruptions could happen without major crash. But the root cause is that phonet_rcv() can queue skb without checking if skb is shared or not. Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	dwc_eth_qos: Fix dma address for multi-fragment skbs	Lars Persson	1	-1/+1
	The offset inside the fragment was not used for the dma address and silent data corruption resulted because TSO makes the checksum match. Fixes: 077742dac2c7 ("dwc_eth_qos: Add support for Synopsys DWC Ethernet QoS") Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	phy: remove an unneeded condition	Dan Carpenter	1	-1/+1
	It used to be that bus->irq was a pointer but after e7f4dc3536a4 ('mdio: Move allocation of interrupts into core') it's an array inside the mdio struct, so it can never be NULL. Let's remove the check. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	mdio: remove an unneed condition	Dan Carpenter	1	-4/+2
	It used to be that mdio->irq was a pointer but after e7f4dc3536a4 ('mdio: Move allocation of interrupts into core') it's an array inside the mdio struct so it can never be NULL. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	mdio_bus: NULL dereference on allocation error	Dan Carpenter	1	-5/+6
	If bus = kzalloc() fails then we end up dereferencing bus when we do "bus->irq[i] = PHY_POLL;". The code is a little simpler if we reverse the NULL check and return directly on failure. Fixes: e7f4dc3536a4 ('mdio: Move allocation of interrupts into core') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12	kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL	Huaitong Han	1	-1/+2
	vmx_cpuid_tries to update SECONDARY_VM_EXEC_CONTROL in the VMCS, but it will cause a vmwrite error on older CPUs because the code does not check for the presence of CPU_BASED_ACTIVATE_SECONDARY_CONTROLS. This will get rid of the following trace on e.g. Core2 6600: vmwrite error: reg 401e value 10 (err 12) Call Trace: [<ffffffff8116e2b9>] dump_stack+0x40/0x57 [<ffffffffa020b88d>] vmx_cpuid_update+0x5d/0x150 [kvm_intel] [<ffffffffa01d8fdc>] kvm_vcpu_ioctl_set_cpuid2+0x4c/0x70 [kvm] [<ffffffffa01b8363>] kvm_arch_vcpu_ioctl+0x903/0xfa0 [kvm] Fixes: feda805fe7c4ed9cf78158e73b1218752e3b4314 Cc: stable@vger.kernel.org Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-11	net: Fix typo in netdev_intersect_features	Tom Herbert	1	-2/+2
	Obviously need to 'or in NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM. Fixes: c8cd0989bd151f ("net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM") Reported-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11	net: freescale: mac-fec: Fix build error from phy_device API change	Andrew Lunn	1	-1/+1
	dev has moved inside the new mdio structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11	net: freescale: ucc_geth: Fix build error from phy_device API change	Andrew Lunn	1	-1/+1
	dev has moved inside the new mdio structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11	cgroup: rename cgroup documentations	Tejun Heo	14	-0/+0
	cgroup-legacy may be too loaded. Rename the docs so that they're postfixed with v1 and v2. * s/cgroup-legacy/cgroup-v1/ * s/cgroup.txt/cgroup-v2.txt/ Signed-off-by: Tejun Heo <tj@kernel.org>
2016-01-11	Input: elantech - mark protocols v2 and v3 as semi-mt	Benjamin Tissoires	1	-1/+1
	When using a protocol v2 or v3 hardware, elantech uses the function elantech_report_semi_mt_data() to report data. This devices are rather creepy because if num_finger is 3, (x2,y2) is (0,0). Yes, only one valid touch is reported. Anyway, userspace (libinput) is now confused by these (0,0) touches, and detect them as palm, and rejects them. Commit 3c0213d17a09 ("Input: elantech - fix semi-mt protocol for v3 HW") was sufficient enough for xf86-input-synaptics and libinput before it has palm rejection. Now we need to actually tell libinput that this device is a semi-mt one and it should not rely on the actual values of the 2 touches. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>