linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-01-04	RDMA/cma: Fix endianness bugs	Sean Hefty	1	-3/+3
	Fix endianness bugs reported by sparse in the RDMA core stack. Note that these are real bugs, but don't affect any existing code to the best of my knowledge. The mlid issue would only affect kernel users of rdma_join_multicast which have the rdma_cm attach/detach its QP. There are no current in tree users that do this. (rdma_join_multicast may be used called by user space applications, which does not have this issue.) And the pkey setting is simply returned as informational. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-12-23	Linux 3.2-rc7	Linus Torvalds	1	-1/+1

2011-12-23	netfilter: xt_connbytes: handle negation correctly	Florian Westphal	1	-3/+3
	"! --connbytes 23:42" should match if the packet/byte count is not in range. As there is no explict "invert match" toggle in the match structure, userspace swaps the from and to arguments (i.e., as if "--connbytes 42:23" were given). However, "what <= 23 && what >= 42" will always be false. Change things so we use "\|\|" in case "from" is larger than "to". This change may look like it breaks backwards compatibility when "to" is 0. However, older iptables binaries will refuse "connbytes 42:0", and current releases treat it to mean "! --connbytes 0:42", so we should be fine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23	Btrfs: call d_instantiate after all ops are setup	Al Viro	1	-4/+5
	This closes races where btrfs is calling d_instantiate too soon during inode creation. All of the callers of btrfs_add_nondir are updated to instantiate after the inode is fully setup in memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-23	Btrfs: fix worker lock misuse in find_worker	Chris Mason	1	-1/+2
	Dan Carpenter noticed that we were doing a double unlock on the worker lock, and sometimes picking a worker thread without the lock held. This fixes both errors. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-12-23	net: relax rcvbuf limits	Eric Dumazet	3	-10/+6
	skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22	rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()	Xi Wang	1	-2/+5
	Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will cause a kernel oops due to insufficient bounds checking. if (count > 1<<30) { /* Enforce a limit to prevent overflow / return -EINVAL; } count = roundup_pow_of_two(count); table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count)); Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as: ... + (count sizeof(struct rps_dev_flow)) where sizeof(struct rps_dev_flow) is 8. (1 << 30) * 8 will overflow 32 bits. This patch replaces the magic number (1 << 30) with a symbolic bound. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22	net: introduce DST_NOPEER dst flag	Eric Dumazet	4	-4/+5
	Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22	mqprio: Avoid panic if no options are provided	Thomas Graf	1	-1/+1
	Userspace may not provide TCA_OPTIONS, in fact tc currently does so not do so if no arguments are specified on the command line. Return EINVAL instead of panicing. Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22	bridge: provide a mtu() method for fake_dst_ops	Eric Dumazet	1	-0/+6
	Commit 618f9bc74a039da76 (net: Move mtu handling down to the protocol depended handlers) forgot the bridge netfilter case, adding a NULL dereference in ip_fragment(). Reported-by: Chris Boot <bootc@bootc.net> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23	md/bitmap: It is OK to clear bits during recovery.	NeilBrown	1	-4/+1
	commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a regression which is annoying but fairly harmless. When writing to an array that is undergoing recovery (a spare in being integrated into the array), writing to the array will set bits in the bitmap, but they will not be cleared when the write completes. For bits covering areas that have not been recovered yet this is not a problem as the recovery will clear the bits. However bits set in already-recovered region will stay set and never be cleared. This doesn't risk data integrity. The only negatives are: - next time there is a crash, more resyncing than necessary will be done. - the bitmap doesn't look clean, which is confusing. While an array is recovering we don't want to update the 'events_cleared' setting in the bitmap but we do still want to clear bits that have very recently been set - providing they were written to the recovering device. So split those two needs - which previously both depended on 'success' and always clear the bit of the write went to all devices. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23	md: don't give up looking for spares on first failure-to-add	NeilBrown	1	-2/+1
	Before performing a recovery we try to remove any spares that might not be working, then add any that might have become relevant. Currently we abort on the first spare that cannot be added. This is a false optimisation. It is conceivable that - depending on rules in the personality - a subsequent spare might be accepted. Also the loop does other things like count the available spares and reset the 'recovery_offset' value. If we abort early these might not happen properly. So remove the early abort. In particular if you have an array what is undergoing recovery and which has extra spares, then the recovery may not restart after as reboot as the could of 'spares' might end up as zero. Reported-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23	md/raid5: ensure correct assessment of drives during degraded reshape.	NeilBrown	1	-4/+10
	While reshaping a degraded array (as when reshaping a RAID0 by first converting it to a degraded RAID4) we currently get confused about which devices are in_sync. In most cases we get it right, but in the region that is being reshaped we need to treat non-failed devices as in-sync when we have the data but haven't actually written it out yet. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23	md/linear: fix hot-add of devices to linear arrays.	NeilBrown	1	-0/+1
	commit d70ed2e4fafdbef0800e73942482bb075c21578b broke hot-add to a linear array. After that commit, metadata if not written to devices until they have been fully integrated into the array as determined by saved_raid_disk. That patch arranged to clear that field after a recovery completed. However for linear arrays, there is no recovery - the integration is instantaneous. So we need to explicitly clear the saved_raid_disk field. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-22	sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().	David S. Miller	1	-2/+2
	This silently was working for many years and stopped working on Niagara-T3 machines. We need to set the MSIQ to VALID before we can set it's state to IDLE. On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL errors. The hypervisor documentation says, rather ambiguously, that the MSIQ must be "initialized" before one can set the state. I previously understood this to mean merely that a successful setconf() operation has been performed on the MSIQ, which we have done at this point. But it seems to also mean that it has been set VALID too. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22	pata_of_platform: Add missing CONFIG_OF_IRQ dependency.	David Miller	1	-1/+1
	Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-12-22	ipv4: using prefetch requires including prefetch.h	Stephen Rothwell	1	-0/+1
	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-22	VFS: Fix race between CPU hotplug and lglocks	Srivatsa S. Bhat	1	-4/+32
	Currently, the _global_[un]lock_online() routines are not at all synchronized with CPU hotplug. Soft-lockups detected as a consequence of this race was reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng for finding out that the root-cause of this issue is the race condition between br_write_[un]lock() and CPU hotplug, which results in the lock states getting messed up). Fixing this race by just adding {get,put}_online_cpus() at appropriate places in _global_[un]lock_online() is not a good option, because, then suddenly br_write_[un]lock() would become blocking, whereas they have been kept as non-blocking all this time, and we would want to keep them that way. So, overall, we want to ensure 3 things: 1. br_write_lock() and br_write_unlock() must remain as non-blocking. 2. The corresponding lock and unlock of the per-cpu spinlocks must not happen for different sets of CPUs. 3. Either prevent any new CPU online operation in between this lock-unlock, or ensure that the newly onlined CPU does not proceed with its corresponding per-cpu spinlock unlocked. To achieve all this: (a) We introduce a new spinlock that is taken by the _global_lock_online() routine and released by the _global_unlock_online() routine. (b) We register a callback for CPU hotplug notifications, and this callback takes the same spinlock as above. (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is initialized in the lock_init() code, all future updates to it are done in the callback, under the above spinlock. (d) The above bitmap is used (instead of cpu_online_mask) while locking and unlocking the per-cpu locks. The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the br_write_lock-unlock sequence is in progress, the callback keeps spinning, thus preventing the CPU online operation till the lock-unlock sequence is complete. This takes care of requirement (3). The bitmap that we maintain remains unmodified throughout the lock-unlock sequence, since all updates to it are managed by the callback, which takes the same spinlock as the one taken by the lock code and released only by the unlock routine. Combining this with (d) above, satisfies requirement (2). Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug operations from racing with br_write_lock-unlock, requirement (1) is also taken care of. By the way, it is to be noted that a CPU offline operation can actually run in parallel with our lock-unlock sequence, because our callback doesn't react to notifications earlier than CPU_DEAD (in order to maintain our bitmap properly). And this means, since we use our own bitmap (which is stale, on purpose) during the lock-unlock sequence, we could end up unlocking the per-cpu lock of an offline CPU (because we had locked it earlier, when the CPU was online), in order to satisfy requirement (2). But this is harmless, though it looks a bit awkward. Debugged-by: Cong Meng <mc@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org
2011-12-21	vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL	Dave Kleikamp	1	-5/+2
	lockdep reports a deadlock in jfs because a special inode's rw semaphore is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used when __read_cache_page() calls add_to_page_cache_lru(). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-21	USB: Fix usb/isp1760 build on sparc	David Miller	1	-4/+4
	This commit: commit 8f5d621543cb064d2989fc223d3c2bc61a43981e Author: Joachim Foerster <joachim.foerster@missinglinkelectronics.com> Date: Mon Oct 10 18:06:54 2011 +0200 usb/isp1760: Let OF bindings depend on general CONFIG_OF instead of PPC_OF . To be able to use the driver on other OF-aware architectures, too. And add necessary OF related #includes to fix compilation error. Signed-off-by: Joachim Foerster <joachim.foerster@missinglinkelectronics.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> enabled the build on all CONFIG_OF architectures, but it cannot do this. This driver depends upon CONFIG_OF_IRQ but not all CONFIG_OF platforms support that infrastructure, in particular Sparc does not so the build fails. Please push a patch like the following to Linus so that this code only gets built where it actually should. -------------------- usb/isp1760: Add missing CONFIG_OF_IRQ dependency on OF code. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21	net: Add a flow_cache_flush_deferred function	Steffen Klassert	3	-4/+27
	flow_cach_flush() might sleep but can be called from atomic context via the xfrm garbage collector. So add a flow_cache_flush_deferred() function and use this if the xfrm garbage colector is invoked from within the packet path. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21	ipv4: reintroduce route cache garbage collector	Eric Dumazet	1	-0/+107
	Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer) removed IP route cache garbage collector a bit too soon, as this gc was responsible for expired routes cleanup, releasing their neighbour reference. As pointed out by Robert Gladewitz, recent kernels can fill and exhaust their neighbour cache. Reintroduce the garbage collection, since we'll have to wait our neighbour lookups become refcount-less to not depend on this stuff. Reported-by: Robert Gladewitz <gladewitz@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20	firmware: Refer to the co-maintained linux-firmware.git repository	Ben Hutchings	1	-1/+2
	David and I are sharing maintenance of this repository. Patches should be sent to both of us. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-21	SELinux: Fix RCU deref check warning in sel_netport_insert()	David Howells	1	-1/+3
	Fix the following bug in sel_netport_insert() where rcu_dereference() should be rcu_dereference_protected() as sel_netport_lock is held. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/selinux/netport.c:127 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by ossec-rootcheck/3323: #0: (sel_netport_lock){+.....}, at: [<ffffffff8117d775>] sel_netport_sid+0xbb/0x226 stack backtrace: Pid: 3323, comm: ossec-rootcheck Not tainted 3.1.0-rc8-fsdevel+ #1095 Call Trace: [<ffffffff8105cfb7>] lockdep_rcu_dereference+0xa7/0xb0 [<ffffffff8117d871>] sel_netport_sid+0x1b7/0x226 [<ffffffff8117d6ba>] ? sel_netport_avc_callback+0xbc/0xbc [<ffffffff8117556c>] selinux_socket_bind+0x115/0x230 [<ffffffff810a5388>] ? might_fault+0x4e/0x9e [<ffffffff810a53d1>] ? might_fault+0x97/0x9e [<ffffffff81171cf4>] security_socket_bind+0x11/0x13 [<ffffffff812ba967>] sys_bind+0x56/0x95 [<ffffffff81380dac>] ? sysret_check+0x27/0x62 [<ffffffff8105b767>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff81076fcd>] ? audit_syscall_entry+0x17b/0x1ae [<ffffffff811b5eae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81380d7b>] system_call_fastpath+0x16/0x1b Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2011-12-20	net: have ipconfig not wait if no dev is available	Gerlando Falauto	1	-0/+4
	previous commit 3fb72f1e6e6165c5f495e8dc11c5bbd14c73385c makes IP-Config wait for carrier on at least one network device. Before waiting (predefined value 120s), check that at least one device was successfully brought up. Otherwise (e.g. buggy bootloader which does not set the MAC address) there is no point in waiting for carrier. Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Holger Brunck <holger.brunck@keymile.com> Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20	sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd	Thomas Graf	2	-11/+3
	When checking whether a DATA chunk fits into the estimated rwnd a full sizeof(struct sk_buff) is added to the needed chunk size. This quickly exhausts the available rwnd space and leads to packets being sent which are much below the PMTU limit. This can lead to much worse performance. The reason for this behaviour was to avoid putting too much memory pressure on the receiver. The concept is not completely irational because a Linux receiver does in fact clone an skb for each DATA chunk delivered. However, Linux also reserves half the available socket buffer space for data structures therefore usage of it is already accounted for. When proposing to change this the last time it was noted that this behaviour was introduced to solve a performance issue caused by rwnd overusage in combination with small DATA chunks. Trying to reproduce this I found that with the sk_buff overhead removed, the performance would improve significantly unless socket buffer limits are increased. The following numbers have been gathered using a patched iperf supporting SCTP over a live 1 Gbit ethernet network. The -l option was used to limit DATA chunk sizes. The numbers listed are based on the average of 3 test runs each. Default values have been used for sk_(r\|w)mem. Chunk Size Unpatched No Overhead ------------------------------------- 4 15.2 Kbit [!] 12.2 Mbit [!] 8 35.8 Kbit [!] 26.0 Mbit [!] 16 95.5 Kbit [!] 54.4 Mbit [!] 32 106.7 Mbit 102.3 Mbit 64 189.2 Mbit 188.3 Mbit 128 331.2 Mbit 334.8 Mbit 256 537.7 Mbit 536.0 Mbit 512 766.9 Mbit 766.6 Mbit 1024 810.1 Mbit 808.6 Mbit Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20	binary_sysctl(): fix memory leak	Michel Lespinasse	1	-1/+1
	binary_sysctl() calls sysctl_getname() which allocates from names_cache slab usin __getname() The matching function to free the name is __putname(), and not putname() which should be used only to match getname() allocations. This is because when auditing is enabled, putname() calls audit_putname instead (not in addition) to __putname(). Then, if a syscall is in progress, audit_putname does not release the name - instead, it expects the name to get released when the syscall completes, but that will happen only if audit_getname() was called previously, i.e. if the name was allocated with getname() rather than the naked __getname(). So, __getname() followed by putname() ends up leaking memory. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	mm/vmalloc.c: remove static declaration of va from __get_vm_area_node	Kautuk Consul	1	-1/+1
	Static storage is not required for the struct vmap_area in __get_vm_area_node. Removing "static" to store this variable on the stack instead. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	ipmi_watchdog: restore settings when BMC reset	Corey Minyard	1	-3/+38
	If the BMC gets reset, it will return 0x80 response errors. In less than a week # grep "Error 80 on cmd 22" /var/log/kernel \|wc -l 378681 In this case, it is probably a good idea to restore the IPMI settings. Signed-off-by: Corey Minyard <cminyard@mvista.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	oom: fix integer overflow of points in oom_badness	Frantisek Hrbata	1	-1/+1
	An integer overflow will happen on 64bit archs if task's sum of rss, swapents and nr_ptes exceeds (2^31)/1000 value. This was introduced by commit f755a04 oom: use pte pages in OOM score where the oom score computation was divided into several steps and it's no longer computed as one expression in unsigned long(rss, swapents, nr_pte are unsigned long), where the result value assigned to points(int) is in range(1..1000). So there could be an int overflow while computing 176 points = 1000; and points may have negative value. Meaning the oom score for a mem hog task will be one. 196 if (points <= 0) 197 return 1; For example: [ 3366] 0 3366 35390480 24303939 5 0 0 oom01 Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child Here the oom1 process consumes more than 24303939(rss)4096~=92GB physical memory, but it's oom score is one. In this situation the mem hog task is skipped and oom killer kills another and most probably innocent task with oom score greater than one. The points variable should be of type long instead of int to prevent the int overflow. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	memcg: keep root group unchanged if creation fails	Hillf Danton	1	-2/+1
	If the request is to create non-root group and we fail to meet it, we should leave the root unchanged. Signed-off-by: Hillf Danton <dhillf@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()	Haogang Chen	1	-0/+3
	There is a potential integer overflow in nilfs_ioctl_clean_segments(). When a large argv[n].v_nmembs is passed from the userspace, the subsequent call to vmalloc() will allocate a buffer smaller than expected, which leads to out-of-bound access in nilfs_ioctl_move_blocks() and lfs_clean_segments(). The following check does not prevent the overflow because nsegs is also controlled by the userspace and could be very large. if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment) goto out_free; This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and returns -EINVAL when overflow. Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	nilfs2: unbreak compat ioctl	Thomas Meyer	1	-0/+13
	commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all other NILFS compat ioctls. Make them work again. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask	David Rientjes	1	-5/+24
	Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a new set of allowed cpuset nodes where the two nodemasks, as a result of the remap, are now disjoint. c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing cpuset's mems") adds get_mems_allowed() to prevent the set of allowed nodes from changing for a thread. This causes any update to a set of allowed nodes to stall until put_mems_allowed() is called. This stall is unncessary, however, if at least one node remains unchanged in the update to the set of allowed nodes. This was addressed by 89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one node remains set"), but it's still possible that an empty nodemask may be read from a mempolicy because the old nodemask may be remapped to the new nodemask during rebind. To prevent this, only avoid the stall if there is no mempolicy for the thread being changed. This is a temporary solution until all reads from mempolicy nodemasks can be guaranteed to not be empty without the get_mems_allowed() synchronization. Also moves the check for nodemask intersection inside task_lock() so that tsk->mems_allowed cannot change. This ensures that nothing can set this tsk's mems_allowed out from under us and also protects tsk->mempolicy. Reported-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20	mfd: Include linux/io.h to jz4740-adc	Axel Lin	1	-0/+1
	Include linux/io.h to fix below build error: CC drivers/mfd/jz4740-adc.o drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_irq_demux': drivers/mfd/jz4740-adc.c:73: error: implicit declaration of function 'readb' drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_enabled': drivers/mfd/jz4740-adc.c:110: error: implicit declaration of function 'writeb' drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_config': drivers/mfd/jz4740-adc.c:146: error: implicit declaration of function 'readl' drivers/mfd/jz4740-adc.c:151: error: implicit declaration of function 'writel' drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_probe': drivers/mfd/jz4740-adc.c:249: error: implicit declaration of function 'ioremap_nocache' drivers/mfd/jz4740-adc.c:249: warning: assignment makes pointer from integer without a cast drivers/mfd/jz4740-adc.c:289: warning: passing argument 3 of 'mfd_add_devices' discards qualifiers from pointer target type include/linux/mfd/core.h:93: note: expected 'struct mfd_cell ' but argument is of type 'const struct mfd_cell ' drivers/mfd/jz4740-adc.c:299: error: implicit declaration of function 'iounmap' make[2]: * [drivers/mfd/jz4740-adc.o] Error 1 make[1]: * [drivers/mfd] Error 2 make: *** [drivers] Error 2 Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Use request_threaded_irq for twl4030-irq instead of irq_set_chained_handler	NeilBrown	1	-5/+8
	irq_set_chained_handler sets 'desc->handle_irq'. However this irq is called by handle_nested_irq from handle_twl4030_pih, and that uses action->thread_fn. So the handled set with irq_set_chained_handler is never called. So change to use request_threaded_irq instead - that sets the correct field. Tested on GTA04 Phoenux. Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Base interrupt for twl4030-irq must be one-shot	NeilBrown	1	-2/+3
	As the interrupt source is only cleared by the threaded interrupt service routine, we need to make the base interrupt IRQF_ONESHOT. Without this, the first interrupt from the TWL4030 cause the CPU to enter an infinite loop trying to handle to interrupt but never clearing it. Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Handle tps65910 clear-mask correctly	Marcus Folkesson	1	-1/+1
	The function is not actually cleaing the bitmask. Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: add #ifdef CONFIG_DEBUG_FS guard for ab8500_debug_resources	Axel Lin	1	-0/+2
	Fix below build warning if CONFIG_DEBUG_FS is disabled. CC drivers/mfd/ab8500-core.o drivers/mfd/ab8500-core.c:623: warning: 'ab8500_debug_resources' defined but not used Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Fix twl-core oops while calling twl_i2c_* for unbound driver	Ilya Yanok	1	-8/+8
	Check inuse variable before trying to access twl_map to prevent dereferencing of uninitialized variable. Signed-off-by: Ilya Yanok <yanok@emcraft.com> Cc: stable@kernel.org Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: include linux/module.h for ab5500-debugfs	Axel Lin	1	-1/+1
	Include linux/module.h to fix below build error: CC drivers/mfd/ab5500-debugfs.o drivers/mfd/ab5500-debugfs.c:571: error: 'THIS_MODULE' undeclared here (not in a function) make[2]: *** [drivers/mfd/ab5500-debugfs.o] Error 1 Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Update wm8994 active device checks for WM1811	Mark Brown	1	-0/+1
	This didn't go in as part of the original MFD patch for WM1811 due to cross tree issues. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Set tps6586x bits if new value is different from the old one	Axel Lin	1	-1/+1
	It does not make sense to write new value only when all the bit_mask bits are zero. We need to write new value if the bit mask fields of new value is not equal to old value. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Set da903x bits if new value is different from the old one	Axel Lin	1	-1/+1
	It does not make sense to write new value only when all the bit_mask bits are zero. We need to write new value if the bit mask fields of new value is not equal to old value. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Set adp5520 bits if new value is different from the old one	Axel Lin	1	-1/+1
	Current code checks if all the bit_mask bits are all zero is wrong. We need to write new value if the bit mask fields of new value is not equal to old value. Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	mfd: Add missed free_irq in da903x_remove	Axel Lin	1	-0/+1
	Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-12-20	evm: prevent racing during tfm allocation	Dmitry Kasatkin	1	-0/+9
	There is a small chance of racing during tfm allocation. This patch fixes it. Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-12-20	evm: key must be set once during initialization	Dmitry Kasatkin	1	-4/+6
	On multi-core systems, setting of the key before every caclculation, causes invalid HMAC calculation for other tfm users, because internal state (ipad, opad) can be invalid before set key call returns. It needs to be set only once during initialization. Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-12-19	mmc: vub300: fix type of firmware_rom_wait_states module parameter	Rusty Russell	1	-1/+1
	You didn't mean this to be a bool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tony Olech <tony.olech@elandigitalsystems.com> Cc: <stable@kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-12-19	Revert "mmc: enable runtime PM by default"	Ohad Ben-Cohen	1	-11/+0
	When SDIO runtime PM was originally introduced, we immediately faced two regressions with two different chipsets, and in response decided not to enable it by default. With the recent work on the 8686 we hoped we found all the gotchas, so 08da834 did make sense (at least experimentally). Unfortunately we now see that some setups out there still refuse to work when SDIO runtime PM is enabled by default (see http://www.spinics.net/lists/linux-mmc/msg11161.html), and obviously we can't live with these kind of regressions. This reverts commit 08da834a24312157f512224691ad1fddd11c1073. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: Daniel Drake <dsd@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org>