linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-03-06	cxgb4: Add support for Chelsio's T480-CR and T440-LP-CR adapters	Vipul Pandya	1	-0/+2
	This patch adds PCI device ids for Chelsio's T480-CR and T440-LP-CR adapters. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	mlx4_core: remove buggy sched_queue masking	Yevgeny Petrilin	1	-5/+0
	Fixes a bug introduced by commit fe9a2603c, where the priority bits in the schedule queue field were masked out. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	netfilter: nf_conntrack: fix early_drop with reliable event delivery	Pablo Neira Ayuso	1	-2/+6
	If reliable event delivery is enabled and ctnetlink fails to deliver the destroy event in early_drop, the conntrack subsystem cannot drop any the candidate flow that was planned to be evicted. Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	bridge: netfilter: don't call iptables on vlan packets if sysctl is off	Florian Westphal	1	-14/+18
	When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets arriving should not be sent to ip(6)tables by bridge netfilter. However, it turns out that we currently always send VLAN packets to netfilter, if .. a), CONFIG_VLAN_8021Q is enabled ; or b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled on the bridge port. This is because bridge netfilter treats skb with skb->protocol == ETH_P_IP{V6} as "non-vlan packet". With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has already been removed here, and we cannot rely on skb->protocol alone. Fix this by only using skb->protocol if the skb has no vlan tag, or if a vlan tag is present and filter-vlan-tagged bridge netfilter sysctl is enabled. We cannot remove the skb->protocol == htons(ETH_P_8021Q) test because the vlan tag is still around in the CONFIG_VLAN_8021Q=n && "ethtool -K $itf rxvlan off" case. reproducer: iptables -t raw -I PREROUTING -i br0 iptables -t raw -I PREROUTING -i br0.1 Then send packets to an ip address configured on br0.1 interface. Even with net.bridge.bridge-nf-filter-vlan-tagged=0, the 1st rule will match instead of the 2nd one. With this patch applied, the 2nd rule will match instead. In the non-local address case, netfilter won't be consulted after this patch unless the sysctl is switched on. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	netfilter: bridge: fix wrong pointer dereference	Pablo Neira Ayuso	1	-1/+1
	In adf7ff8, a invalid dereference was added in ebt_make_names. CC [M] net/bridge/netfilter/ebtables.o net/bridge/netfilter/ebtables.c: In function `ebt_make_names': net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	netfilter: ctnetlink: remove incorrect spin_[un]lock_bh on NAT module autoload	Pablo Neira Ayuso	1	-3/+0
	Since 7d367e0, ctnetlink_new_conntrack is called without holding the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup does not require to release that spinlock anymore in the NAT module autoload case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	netfilter: ebtables: fix wrong name length while copying to user-space	Santosh Nayak	1	-3/+13
	user-space ebtables expects 32 bytes-long names, but xt_match names use 29 bytes. We have to copy less 29 bytes and then, make sure we fill the remaining bytes with zeroes. Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	r8169: runtime resume before shutdown.	françois romieu	1	-0/+5
	With runtime PM, if the ethernet cable is disconnected, the device is transitioned to D3 state to conserve energy. If the system is shutdown in this state, any register accesses in rtl_shutdown are dropped on the floor. As the device was programmed by .runtime_suspend() to wake on link changes, it is thus brought back up as soon as the link recovers. Resuming every suspended device through the driver core would slow things down and it is not clear how many devices really need it now. Original report and D0 transition patch by Sameer Nanda. Patch has been changed to comply with advices by Rafael J. Wysocki and the PM folks. Reported-by: Sameer Nanda <snanda@chromium.org> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Hayes Wang <hayeswang@realtek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una	Neal Cardwell	1	-0/+4
	This commit fixes tcp_shift_skb_data() so that it does not shift SACKed data below snd_una. This fixes an issue whose symptoms exactly match reports showing tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at net/ipv4/tcp_input.c:3418" thread on netdev). Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41) tcp_shift_skb_data() had been shifting SACKed ranges that were below snd_una. It checked that the end of the skb it was about to shift from was above snd_una, but did not check that the end of the actual shifted range was above snd_una; this commit adds that check. Shifting SACKed ranges below snd_una is problematic because for such ranges tcp_sacktag_one() short-circuits: it does not declare anything as SACKed and does not increase sacked_out. Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9 and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges below snd_una happened to work because tcp_shifted_skb() was always (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence tcp_sacktag_one() never short-circuited and always increased tp->sacked_out in this case. After those two fixes, my testing has verified that shifting SACKed ranges below snd_una could cause tp->sacked_out to go negative with the following sequence of events: (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una, then shifts a prefix of that skb that is below snd_una (2) tcp_shifted_skb() increments the packet count of the already-SACKed prev sk_buff (3) tcp_sacktag_one() sees the end of the new SACKed range is below snd_una, so it short-circuits and doesn't increase tp->sacked_out (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed, decrements tp->sacked_out by this "inflated" pcount that was missing a matching increase in tp->sacked_out, and hence tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted to s32 is negative. (6) this leads to the warnings seen in the recent "WARNING: at net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.: tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0); More generally, I think this bug can be tickled in some cases where two or more ACKs from the receiver are lost and then a DSACK arrives that is immediately above an existing SACKed skb in the write queue. This fix changes tcp_shift_skb_data() to abort this sequence at step (1) in the scenario above by noticing that the bytes are below snd_una and not shifting them. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06	tg3: Fix to use multi queue BQL interfaces	Tom Herbert	1	-3/+3
	Fix tg3 to use BQL multi queue related netdev interfaces since the device supports multi queue. Signed-off-by: Tom Herbert <therbert@google.com> Reported-by: Christoph Lameter <cl@gentwo.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05	rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler	Alexandre Bounine	1	-2/+3
	Fix a bug that causes a kernel panic when the number of received doorbells is larger than number of entries in the inbound doorbell queue (current default value = 512). Another possible indication for this bug is large number of spurious doorbells reported by tsi721 driver after reaching the queue size maximum. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Chul Kim <chul.kim@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: <stable@vger.kernel.org> [3.2.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	memcg: fix mapcount check in move charge code for anonymous page	Naoya Horiguchi	1	-1/+1
	Currently the charge on shared anonyous pages is supposed not to moved in task migration. To implement this, we need to check that mapcount > 1, instread of > 2. So this patch fixes it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	mm: thp: fix BUG on mm->nr_ptes	Andrea Arcangeli	1	-3/+3
	Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...) in exit_mmap() recently. Quoting Hugh's discovery and explanation of the SMP race condition: "mm->nr_ptes had unusual locking: down_read mmap_sem plus page_table_lock when incrementing, down_write mmap_sem (or mm_users 0) when decrementing; whereas THP is careful to increment and decrement it under page_table_lock. Now most of those paths in THP also hold mmap_sem for read or write (with appropriate checks on mm_users), but two do not: when split_huge_page() is called by hwpoison_user_mappings(), and when called by add_to_swap(). It's conceivable that the latter case is responsible for the exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora." The simplest way to fix it without having to alter the locking is to make split_huge_page() a noop in nr_ptes terms, so by counting the preallocated pagetables that exists for every mapped hugepage. It was an arbitrary choice not to count them and either way is not wrong or right, because they are not used but they're still allocated. Reported-by: Dave Jones <davej@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: <stable@vger.kernel.org> [3.0.x, 3.1.x, 3.2.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	alpha: fix 32/64-bit bug in futex support	Andrew Morton	1	-1/+1
	Michael Cree said: : : I have noticed some user space problems (pulseaudio crashes in pthread : : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha : : systems) that arise when using a 2.6.39 or later kernel on Alpha. : : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as : : criterion for good/bad kernel) eventually leads to: : : : : 8d7718aa082aaf30a0b4989e1f04858952f941bc is the first bad commit : : commit 8d7718aa082aaf30a0b4989e1f04858952f941bc : : Author: Michel Lespinasse <walken@google.com> : : Date: Thu Mar 10 18:50:58 2011 -0800 : : : : futex: Sanitize futex ops argument types : : : : Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic : : prototypes to use u32 types for the futex as this is the data type the : : futex core code uses all over the place. : : : : Looking at the commit I see there is a change of the uaddr argument in : : the Alpha architecture specific code for futexes from int to u32, but I : : don't see why this should cause a problem. Richard Henderson said: : futex_atomic_cmpxchg_inatomic(u32 uval, u32 __user uaddr, : u32 oldval, u32 newval) : ... : : "r"(uaddr), "r"((long)oldval), "r"(newval) : : : There is no 32-bit compare instruction. These are implemented by : consistently extending the values to a 64-bit type. Since the : load instruction sign-extends, we want to sign-extend the other : quantity as well (despite the fact it's logically unsigned). : : So: : : - : "r"(uaddr), "r"((long)oldval), "r"(newval) : + : "r"(uaddr), "r"((long)(int)oldval), "r"(newval) : : should do the trick. Michael said: : This fixes the glibc test suite failures and the pulseaudio related : crashes, but it does not fix the java compiiler lockups that I was (and : are still) observing. That is some other problem. Reported-by: Michael Cree <mcree@orcon.net.nz> Tested-by: Michael Cree <mcree@orcon.net.nz> Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Michel Lespinasse <walken@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	memcg: fix GPF when cgroup removal races with last exit	Hugh Dickins	6	-48/+18
	When moving tasks from old memcg (with move_charge_at_immigrate on new memcg), followed by removal of old memcg, hit General Protection Fault in mem_cgroup_lru_del_list() (called from release_pages called from free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from exit_mmap from mmput from exit_mm from do_exit). Somewhat reproducible, takes a few hours: the old struct mem_cgroup has been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is still trying to update its stats, and take page off lru before freeing. A task, or a charge, or a page on lru: each secures a memcg against removal. In this case, the last task has been moved out of the old memcg, and it is exiting: anonymous pages are uncharged one by one from the memcg, as they are zapped from its pagetables, so the charge gets down to 0; but the pages themselves are queued in an mmu_gather for freeing. Most of those pages will be on lru (and force_empty is careful to lru_add_drain_all, to add pages from pagevec to lru first), but not necessarily all: perhaps some have been isolated for page reclaim, perhaps some isolated for other reasons. So, force_empty may find no task, no charge and no page on lru, and let the removal proceed. There would still be no problem if these pages were immediately freed; but typically (and the put_page_testzero protocol demands it) they have to be added back to lru before they are found freeable, then removed from lru and freed. We don't see the issue when adding, because the mem_cgroup_iter() loops keep their own reference to the memcg being scanned; but when it comes to mem_cgroup_lru_del_list(). I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and PageCgroupUsed flags were used (like a trick with mirrors) to deflect view of pc->mem_cgroup to the stable root_mem_cgroup when neither set. 38c5d72f3ebe ("memcg: simplify LRU handling by new rule") mercifully removed those convolutions, but left this General Protection Fault. But it's surprisingly easy to restore the old behaviour: just check PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec to add), and reset pc to root_mem_cgroup if page is uncharged. A risky change? just going back to how it worked before; testing, and an audit of uses of pc->mem_cgroup, show no problem. And there's a nice bonus: with mem_cgroup_lru_add_list() itself making sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no longer has any purpose, and we can safely revert 4e5f01c2b9b9 ("memcg: clear pc->mem_cgroup if necessary"). Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c is not strictly necessary: the lru_lock there, with RCU before memcg structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe without that; but it seems cleaner to rely on one dependency less. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	debugobjects: Fix selftest for static warnings	Stephen Boyd	1	-11/+3
	debugobjects is now printing a warning when a fixup for a NOTAVAILABLE object is run. This causes the selftest to fail like: ODEBUG: selftest warnings failed 4 != 5 We could just increase the number of warnings that the selftest is expecting to see because that is actually what has changed. But, it turns out that fixup_activate() was written with inverted logic and thus a fixup for a static object returned 1 indicating the object had been fixed, and 0 otherwise. Fix the logic to be correct and update the counts to reflect that nothing needed fixing for a static object. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	floppy/scsi: fix setting of BIO flags	Muthu Kumar	2	-2/+2
	Fix setting bio flags in drivers (sd_dif/floppy). Signed-off-by: Muthukumar R <muthur@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	memcg: fix deadlock by inverting lrucare nesting	Hugh Dickins	1	-35/+37
	We have forgotten the rules of lock nesting: the irq-safe ones must be taken inside the non-irq-safe ones, otherwise we are open to deadlock: CPU0 CPU1 ---- ---- lock(&(&pc->lock)->rlock); local_irq_disable(); lock(&(&zone->lru_lock)->rlock); lock(&(&pc->lock)->rlock); <Interrupt> lock(&(&zone->lru_lock)->rlock); To check a different locking issue, I happened to add a spin_lock to memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above). So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare to __mem_cgroup_commit_charge() instead, taking zone->lru_lock under lock_page_cgroup() in the lrucare case. The original was using spin_lock_irqsave, but we'd be in more trouble if it were ever called at interrupt time: unconditional _irq is enough. And ClearPageLRU before del from lru, SetPageLRU before add to lru: no strong reason, but that is the ordering used consistently elsewhere. Fixes 36b62ad539498d00c2d280a151a ("memcg: simplify corner case handling of LRU"). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()	Anatolij Gustschin	1	-7/+7
	If probing the RTC didn't succeed due to failed RTC register access, the RTC device will be unregistered. Then, when removing the module r9701_remove() causes a kernel crash while trying to unregister a not registered RTC device. Fix this by doing RTC register access test before RTC device registration. Signed-off-by: Anatolij Gustschin <agust@denx.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	c2port: class_create() returns an ERR_PTR	Dan Carpenter	1	-2/+2
	class_create() doesn't return a NULL, it only returns ERR_PTRs. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	pps: class_create() returns an ERR_PTR, not NULL	Dan Carpenter	1	-2/+2
	class_create() never returns NULLs only ERR_PTRs. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	hung_task: fix the broken rcu_lock_break() logic	Oleg Nesterov	1	-4/+7
	check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by "softlockup: check all tasks in hung_task" commit ce9dbe24 looks absolutely wrong. - rcu_lock_break() does put_task_struct(). If the task has exited it is not safe to even read its ->state, nothing protects this task_struct. - The TASK_DEAD checks are wrong too. Contrary to the comment, we can't use it to check if the task was unhashed. It can be unhashed without TASK_DEAD, or it can be valid with TASK_DEAD. For example, an autoreaping task can do release_task(current) long before it sets TASK_DEAD in do_exit(). Or, a zombie task can have ->state == TASK_DEAD but release_task() was not called, and in this case we must not break the loop. Change this code to check pid_alive() instead, and do this before we drop the reference to the task_struct. Note: while_each_thread() under rcu_read_lock() is not really safe, it can livelock. This will be fixed later, but fortunately in this case the "max_count" logic saves us anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mandeep Singh Baines <msb@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	vfork: kill PF_STARTING	Oleg Nesterov	2	-10/+0
	Previously it was (ab)used by utrace. Then it was wrongly used by the scheduler code. Currently it is not used, kill it before it finds the new erroneous user. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	coredump_wait: don't call complete_vfork_done()	Oleg Nesterov	3	-14/+3
	Now that CLONE_VFORK is killable, coredump_wait() no longer needs complete_vfork_done(). zap_threads() should find and kill all tasks with the same ->mm, this includes our parent if ->vfork_done is set. mm_release() becomes the only caller, unexport complete_vfork_done(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	vfork: make it killable	Oleg Nesterov	2	-9/+33
	Make vfork() killable. Change do_fork(CLONE_VFORK) to do wait_for_completion_killable(). If it fails we do not return to the user-mode and never touch the memory shared with our child. However, in this case we should clear child->vfork_done before return, we use task_lock() in do_fork()->wait_for_vfork_done() and complete_vfork_done() to serialize with each other. Note: now that we use task_lock() we don't really need completion, we could turn task->vfork_done into "task_struct *wake_up_me" but this needs some complications. NOTE: this and the next patches do not affect in-kernel users of CLONE_VFORK, kernel threads run with all signals ignored including SIGKILL/SIGSTOP. However this is obviously the user-visible change. Not only a fatal signal can kill the vforking parent, a sub-thread can do execve or exit_group() and kill the thread sleeping in vfork(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	vfork: introduce complete_vfork_done()	Oleg Nesterov	3	-13/+13
	No functional changes. Move the clear-and-complete-vfork_done code into the new trivial helper, complete_vfork_done(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	aio: wake up waiters when freeing unused kiocbs	Jeff Moyer	1	-0/+2
	Bart Van Assche reported a hung fio process when either hot-removing storage or when interrupting the fio process itself. The (pruned) call trace for the latter looks like so: fio D 0000000000000001 0 6849 6848 0x00000004 ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0 ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8 ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d Call Trace: schedule+0x3f/0x60 io_schedule+0x8f/0xd0 wait_for_all_aios+0xc0/0x100 exit_aio+0x55/0xc0 mmput+0x2d/0x110 exit_mm+0x10d/0x130 do_exit+0x671/0x860 do_group_exit+0x44/0xb0 get_signal_to_deliver+0x218/0x5a0 do_signal+0x65/0x700 do_notify_resume+0x65/0x80 int_signal+0x12/0x17 The problem lies with the allocation batching code. It will opportunistically allocate kiocbs, and then trim back the list of iocbs when there is not enough room in the completion ring to hold all of the events. In the case above, what happens is that the pruning back of events ends up freeing up the last active request and the context is marked as dead, so it is thus responsible for waking up waiters. Unfortunately, the code does not check for this condition, so we end up with a hung task. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> [3.2.x only] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	kprobes: return proper error code from register_kprobe()	Prashanth Nageshappa	1	-5/+7
	register_kprobe() aborts if the address of the new request falls in a prohibited area (such as ftrace pouch, __kprobes annotated functions, non-kernel text addresses, jump label text). We however don't return the right error on this abort, resulting in a silent failure - incorrect adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe mcount' for instance). In V2 we are incorporating Masami Hiramatsu's feedback. This patch fixes it by returning -EINVAL upon failure. While we are here, rename the label used for exit to be more appropriate. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Prashanth K Nageshappa <prashanth@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jason Baron <jbaron@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	kmsg_dump: don't run on non-error paths by default	Matthew Garrett	3	-2/+19
	Since commit 04c6862c055f ("kmsg_dump: add kmsg_dump() calls to the reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets run on normal paths including poweroff and reboot. This is less than ideal given pstore implementations that can only represent single backtraces, since a reboot may overwrite a stored oops before it's been picked up by userspace. In addition, some pstore backends may have low performance and provide a significant delay in reboot as a result. This patch adds a printk.always_kmsg_dump kernel parameter (which can also be changed from userspace). Without it, the code will only be run on failure paths rather than on normal paths. The option can be enabled in environments where there's a desire to attempt to audit whether or not a reboot was cleanly requested or not. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-06	md/raid10: fix assembling of arrays with replacement devices.	NeilBrown	1	-1/+0
	commit 56a2559bb654a (md/raid10: recognise replacements ...) changed 'run' to set ->replacement or ->rdev depending on the 'Replacement' status if the device, but it didn't remove the old unconditional setting of 'rdev'. So it was largely ineffective. So remove that now. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-05	drm, gma500: Fix Cedarview boot failures in 3.3-rc	Alan Cox	3	-6/+6
	Production GMA3600/3650 hardware turns out to be subtly different to the development platforms. This combined with a minor driver bug is causing the kernel to hang on these platforms. This patch does the following - turn down a couple of messages that were meant to be debug and are causing much confusion - ensure the hotplug interrupt is disabled on Cedartrail systems. - fix a bug where gtt roll mode called psbfb_sync, which tries to sync the 2D engine. On other devices it is harmless as the 2D engine is present but not in use when in gtt roll mode, on Cedartrail it causes a hang Without these changes 3.3-rc hangs on boot on Cedartrail based systems. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held	Al Viro	1	-1/+1
	All other callers already hold either ->mmap_sem (exclusive) or ->page_table_lock. And we need it because some page table flushing instanced do work explicitly with ge tables. See e.g. arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and flush_range() in there. The same goes for uml, with a lot more extensive playing with page tables. Almost all callers are actually fine - flush_tlb_range() may have no need to bother playing with page tables, but it can do so safely; again, this caller is the sole exception - everything else either has exclusive ->mmap_sem on the mm in question, or mm->page_table_lock is held. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	VM_GROWS{UP,DOWN} shouldn't be set on shmem VMAs	Al Viro	1	-0/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	aout: move setup_arg_pages() prior to reading/mapping the binary	Al Viro	2	-14/+14
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05	qla3xxx: ethernet: Fix bogus interrupt state flag.	Santosh Nayak	1	-3/+2
	In 'ql_adapter_initialize' the first call for 'spin_unlock_irqrestore()' is with hw_flags = 0, which is as good as 'spin_unlock_irq()' (unconditional interrupt enabling). If this is intended, then for better performance 'spin_unlock_irqrestore()' can be replaced with 'spin_unlock_irq()' and 'spin_lock_irqsave()' can be replaced by 'spin_lock_irq() Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05	bridge: check return value of ipv6_dev_get_saddr()	Ulrich Weber	1	-2/+5
	otherwise source IPv6 address of ICMPV6_MGM_QUERY packet might be random junk if IPv6 is disabled on interface or link-local address is not yet ready (DAD). Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05	HID: hid-input: allow array fields out of range	Nikolai Kondrashov	1	-2/+7
	Allow array field values out of range as per HID 1.11 specification, section 6.2.25: Rather than returning a single bit for each button in the group, an array returns an index in each field that corresponds to the pressed button (like keyboard scan codes). An out-of range value in and array field is considered no controls asserted. Apparently, "and" above is a typo and should be "an". This fixes at least Waltop tablet pen clicks - otherwise BTN_TOUCH is never released. The relevant part of Waltop tablet report descriptors is this: 0x09, 0x42, /* Usage (Tip Switch), / 0x09, 0x44, / Usage (Barrel Switch), / 0x09, 0x46, / Usage (Tablet Pick), / 0x15, 0x01, / Logical Minimum (1), / 0x25, 0x03, / Logical Maximum (3), / 0x75, 0x04, / Report Size (4), / 0x95, 0x01, / Report Count (1), / 0x80, / Input, */ This is a regression fix for commit b4b583d ("HID: be more strict when ignoring out-of-range fields"). Signed-off-by: Nikolai Kondrashov <spbnick@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-04	rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()	Eric Dumazet	1	-8/+10
	nlmsg_parse() might return an error, so test its return value before potential random memory accesses. Errors introduced in commit 115c9b81928 (rtnetlink: Fix problem with buffer allocation) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04	bridge: message age needs to increase, not decrease.	Joakim Tjernlund	1	-1/+1
	commit bridge: send proper message_age in config BPDU added this gem: bpdu.message_age = (jiffies - root->designated_age) p->designated_age = jiffies + bpdu->message_age; Notice how bpdu->message_age is negated when reassigned to bpdu.message_age. This causes message age to decrease breaking the STP protocol. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04	bridge: Adjust min age inc for HZ > 256	Joakim Tjernlund	1	-2/+2
	min age increment needs to round up its min age tick for all HZ values to guarantee message age is increasing. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04	vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c	Linus Torvalds	2	-20/+20
	It's only used inside fs/dcache.c, and we're going to play games with it for the word-at-a-time patches. This time we really don't even want to export it, because it really is an internal function to fs/dcache.c, and has been since it was introduced. Having it in that extremely hot header file (it's included in pretty much everything, thanks to <linux/fs.h>) is a disaster for testing different versions, and is utterly pointless. We really should have some kind of header file diet thing, where we figure out which parts of header files are really better off private and only result in more expensive compiles. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-04	percpu: fix __this_cpu_{sub,inc,dec}_return() definition	Konstantin Khlebnikov	1	-3/+3
	This patch adds missed "__" prefixes, otherwise these functions works as irq/preemption safe. Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-03-04	mmc: core: Fixup suspend/resume issues for UHS-I cards	Ulf Hansson	4	-5/+17
	Even if cards supports 1.8V I/O voltage those should anyway be initialized at 3.3V I/O according to (e)MMC, SD and SDIO specs. Some eMMC and embedded SDIO devices are able to be initialized at 1.8V as well, but it is better to be safe. Do note that initialization in this context means that the card has been completely powered off, otherwise the card will remain at the last I/O voltage level that were negotitiated. Due to the above being taken care of the suspend/resume issues for UHS-I SD-cards has been fixed. Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-03-04	mmc: mmci: reduce max_blk_count to avoid overflowing max_req_size	Will Deacon	1	-3/+4
	On a system with large pages (64k in my case), the following BUG is triggered in MMC core: [ 2.338023] BUG: failure at drivers/mmc/core/core.c:221/mmc_start_request()! [ 2.338102] Kernel panic - not syncing: BUG! [ 2.338155] Call trace: [ 2.338228] [<ffffffc00008635c>] dump_backtrace+0x0/0x120 [ 2.338317] [<ffffffc0003365ec>] dump_stack+0x14/0x1c [ 2.338403] [<ffffffc000336990>] panic+0xbc/0x1f0 [ 2.338498] [<ffffffc00027a494>] mmc_start_request+0x154/0x184 [ 2.338600] [<ffffffc00027abdc>] mmc_start_req+0x110/0x140 [ 2.338701] [<ffffffc00028604c>] mmc_blk_issue_rw_rq+0x7c/0x39c [ 2.338804] [<ffffffc00028652c>] mmc_blk_issue_rq+0x1c0/0x468 [ 2.338905] [<ffffffc000287564>] mmc_queue_thread+0x68/0x118 [ 2.338995] [<ffffffc0000bc308>] kthread+0x84/0x8c This is because of a 64k request with a max_req_size of 64k-1 bytes. The following patch fixes the problem by limiting the max_blk_count such that max_blk_count * max_blk_size == max_req_size. I couldn't pursuade the compiler to emit a shift instead of a div without encoding the shift explicitly. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-03-04	mmc: sdhci-esdhc-imx: fix for mmc cards on i.MX5	Sascha Hauer	1	-2/+3
	On i.MX53 we have to write a special SDHCI_CMD_ABORTCMD to the SDHCI_TRANSFER_MODE register during a MMC_STOP_TRANSMISSION command. This works for SD cards. However, with MMC cards the MMC_SET_BLOCK_COUNT command is used instead, but this needs the same handling. Fix MMC cards by testing for the MMC_SET_BLOCK_COUNT command aswell. Tested on a custom i.MX53 board with a Transcend MMC+ card and eMMC. The kernel started used MMC_SET_BLOCK_COUNT in 3.0, so this is a regression for these boards introduced in 3.0; it should go to 3.0/3.1/3.2-stable. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-03-04	mmc: core: fix regression: set default clock gating delay to 0	Guennadi Liakhovetski	1	-2/+2
	A recent commit "mmc: core: Use delayed work in clock gating framework" (597dd9d79cfbbb1) introduced a default 200ms delay before clock gating actually takes place. This means that every time an MMC interface becomes idle it first stays on for 200ms before gating its clock. This leads to increased power consumption and is therefore a clear regression. This patch restores the original behaviour by setting the default delay to 0. Users prioritising throughput over power efficiency can still modify the delay via sysfs. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-03-04	MAINTAINERS: hand over atmel-mci (sd/mmc interface)	Nicolas Ferre	1	-2/+2
	Modify MAINTAINERS entry for Atmel SD/MMC drivers. I hand the atmel-mci and at91_mci drivers over to Ludovic. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-03-03	Linux 3.3-rc6	Linus Torvalds	1	-1/+1

2012-03-03	tcp: don't fragment SACKed skbs in tcp_mark_head_lost()	Neal Cardwell	1	-0/+1
	In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb to mark the first portion as lost. This is for two primary reasons: (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When doing this, it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. But given that skbs can have remainders that do not align to MSS boundaries, this packet count preservation means that for SACKed skbs there is not necessarily a direct linear relationship between tcp_skb_pcount(skb) and skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment off and mark as lost a prefix of length (packets - oldcnt)*mss from SACKed skbs were leading to occasional failures of the WARN_ON(len > skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the recent "crash in tcp_fragment" thread on netdev). (2) there is no real point in fragmenting off part of a SACKed skb and calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP for SACKed skbs. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-03	perf tools: Handle kernels that don't support attr.exclude_{guest,host}	Arnaldo Carvalho de Melo	4	-15/+39
	Just fall back to resetting those fields, if set, warning the user that that feature is not available. If guest samples appear they will just be discarded because no struct machine will be found and thus the event will be accounted as not handled and dropped, see 0c09571. Reported-by: Namhyung Kim <namhyung@gmail.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vuwxig36mzprl5n7nzvnxxsh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>