linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-06-17	selinux, kbuild: remove unnecessary $(hostprogs-y) from clean-files	Masahiro Yamada	1	-1/+0
	Files added to hostprogs-y are cleaned. (See scripts/Makefile.clean) Adding them to clean-files is redundant. Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-08	Linux 3.15	Linus Torvalds	1	-1/+1

2014-06-08	Revert "x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"	Linus Torvalds	2	-47/+79
	This reverts commit 3e1a878b7ccdb31da6d9d2b855c72ad87afeba3f. It came in very late, and already has one reported failure: Sitsofe reports that the current tree fails to boot on his EeePC, and bisected it down to this. Rather than waste time trying to figure out what's wrong, just revert it. Reported-by: Sitsofe Wheeler <sitsofe@gmail.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-07	x86/boot: EFI_MIXED should not prohibit loading above 4G	Matt Fleming	1	-2/+1
	commit 7d453eee36ae ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a regression for the functionality to load kernels above 4G. The relevant (incorrect) reasoning behind this change can be seen in the commit message, "The xloadflags field in the bzImage header is also updated to reflect that the kernel supports both entry points by setting both of XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y. XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is guaranteed to be addressable with 32-bits." This is obviously bogus since 32-bit EFI loaders will never place the kernel above the 4G mark. So this restriction is entirely unnecessary. But things are worse than that - since we want to encourage people to always compile with CONFIG_EFI_MIXED=y so that their kernels work out of the box for both 32-bit and 64-bit firmware, commit 7d453eee36ae effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely. Remove the overzealous and superfluous restriction and restore the XLF_CAN_BE_LOADED_ABOVE_4G functionality. Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-06	mm: add !pte_present() check on existing hugetlb_entry callbacks	Naoya Horiguchi	2	-2/+6
	The age table walker doesn't check non-present hugetlb entry in common path, so hugetlb_entry() callbacks must check it. The reason for this behavior is that some callers want to handle it in its own way. [ I think that reason is bogus, btw - it should just do what the regular code does, which is to call the "pte_hole()" function for such hugetlb entries - Linus] However, some callers don't check it now, which causes unpredictable result, for example when we have a race between migrating hugepage and reading /proc/pid/numa_maps. This patch fixes it by adding !pte_present checks on buggy callbacks. This bug exists for years and got visible by introducing hugepage migration. ChangeLog v2: - fix if condition (check !pte_present() instead of pte_present()) Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Backported to 3.15. Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06	Btrfs: send, fix corrupted path strings for long paths	Filipe Manana	1	-2/+5
	If a path has more than 230 characters, we allocate a new buffer to use for the path, but we were forgotting to copy the contents of the previous buffer into the new one, which has random content from the kmalloc call. Test: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#" mkdir -p $TEST_PATH echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt btrfs subvolume snapshot -r /mnt /mnt/mysnap1 btrfs send /mnt/mysnap1 -f /tmp/1.snap A test for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Cc: Marc Merlin <marc@merlins.org> Tested-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-06	mm: rmap: fix use-after-free in __put_anon_vma	Andrey Ryabinin	1	-2/+1
	While working address sanitizer for kernel I've discovered use-after-free bug in __put_anon_vma. For the last anon_vma, anon_vma->root freed before child anon_vma. Later in anon_vma_free(anon_vma) we are referencing to already freed anon_vma->root to check rwsem. This fixes it by freeing the child anon_vma before freeing anon_vma->root. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06	target: Fix alua_access_state attribute OOPs for un-configured devices	Nicholas Bellinger	1	-0/+5
	This patch fixes a OOPs where an attempt to write to the per-device alua_access_state configfs attribute at: /sys/kernel/config/target/core/$HBA/$DEV/alua/$TG_PT_GP/alua_access_state results in an NULL pointer dereference when the backend device has not yet been configured. This patch adds an explicit check for DF_CONFIGURED, and fails with -ENODEV to avoid this case. Reported-by: Chris Boot <crb@tiger-computing.co.uk> Reported-by: Philip Gaw <pgaw@darktech.org.uk> Cc: Chris Boot <crb@tiger-computing.co.uk> Cc: Philip Gaw <pgaw@darktech.org.uk> Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-06	target: Allow READ_CAPACITY opcode in ALUA Standby access state	Nicholas Bellinger	1	-0/+9
	This patch allows READ_CAPACITY + SAI_READ_CAPACITY_16 opcode processing to occur while the associated ALUA group is in Standby access state. This is required to avoid host side LUN probe failures during the initial scan if an ALUA group has already implicitly changed into Standby access state. This addresses a bug reported by Chris + Philip using dm-multipath + ESX hosts configured with ALUA multipath. Reported-by: Chris Boot <crb@tiger-computing.co.uk> Reported-by: Philip Gaw <pgaw@darktech.org.uk> Cc: Chris Boot <crb@tiger-computing.co.uk> Cc: Philip Gaw <pgaw@darktech.org.uk> Cc: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-05	futex: Make lookup_pi_state more robust	Thomas Gleixner	1	-28/+106
	The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter \| pi_state \| pi->owner \| uTID \| uODIED \| ? [1] NULL \| --- \| --- \| 0 \| 0/1 \| Valid [2] NULL \| --- \| --- \| >0 \| 0/1 \| Valid [3] Found \| NULL \| -- \| Any \| 0/1 \| Invalid [4] Found \| Found \| NULL \| 0 \| 1 \| Valid [5] Found \| Found \| NULL \| >0 \| 1 \| Invalid [6] Found \| Found \| task \| 0 \| 1 \| Valid [7] Found \| Found \| NULL \| Any \| 0 \| Invalid [8] Found \| Found \| task \| ==taskTID \| 0/1 \| Valid [9] Found \| Found \| task \| 0 \| 0 \| Invalid [10] Found \| Found \| task \| !=taskTID \| 0/1 \| Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS \| FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05	futex: Always cleanup owner tid in unlock_pi	Thomas Gleixner	1	-22/+18
	If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05	futex: Validate atomic acquisition in futex_lock_pi_atomic()	Thomas Gleixner	1	-3/+11
	We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05	futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)	Thomas Gleixner	1	-0/+25
	If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05	x86/smpboot: Initialize secondary CPU only if master CPU will wait for it	Igor Mammedov	2	-79/+47
	Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	x86/smpboot: Log error on secondary CPU wakeup failure at ERR level	Igor Mammedov	1	-1/+1
	If system is running without debug level logging, it will not log error if do_boot_cpu() failed to wakeup AP. It may lead to silent AP bringup failures at boot time. Change message level to KERN_ERR to make error visible to user as it's done on other architectures. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	x86: Fix list/memory corruption on CPU hotplug	Igor Mammedov	1	-3/+0
	currently if AP wake up is failed, master CPU marks AP as not present in do_boot_cpu() by calling set_cpu_present(cpu, false). That leads to following list corruption on the next physical CPU hotplug: [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0() [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0). [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387 [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021 [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8 [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea [ 418.178445] Call Trace: [ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c [ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0 [ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50 [ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7 [ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0 [ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200 [ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60 [ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70 [ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550 [ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0 [ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30 [ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130 [ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150 [ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b ... Which is caused by the fact that generic_processor_info() allocates logical CPU id by calling: cpu = cpumask_next_zero(-1, cpu_present_mask); which returns id of previously failed to wake up CPU, since its bit is cleared by do_boot_cpu() and as result register_cpu() tries to register another CPU with the same id as already present but failed to be onlined CPU. Taking in account that AP will not do anything if master CPU failed to wake it up, there is no reason to mark that AP as not present and break next cpu hotplug attempts. As a side effect of not marking AP as not present, user would be allowed to online it again later. Also fix memory corruption in acpi_unmap_lsapic() if during CPU hotplug master CPU failed to wake up AP it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP. However following attempt to unplug that CPU will lead to out of bound write access to __apicid_to_node[] which is 32768 items long on x86_64 kernel. So with above fix of cpu_present_mask make sure that a present CPU has a valid APIC ID by not setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu() on failure and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock	Roman Gushchin	3	-7/+6
	tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to force the period timer restart. It's not safe, because can lead to deadlock, described in commit 927b54fccbf0: "__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock, waiting for the hrtimer to finish. However, if sched_cfs_period_timer runs for another loop iteration, the hrtimer can attempt to take rq->lock, resulting in deadlock." Three CPUs must be involved: CPU0 CPU1 CPU2 take rq->lock period timer fired ... take cfs_b lock ... ... tg_set_cfs_bandwidth() throttle_cfs_rq() release cfs_b lock take cfs_b lock ... distribute_cfs_runtime() timer_active = 0 take cfs_b->lock wait for rq->lock ... __start_cfs_bandwidth() {wait for timer callback break if timer_active == 1} So, CPU0 and CPU1 are deadlocked. Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can wait for period timer callbacks (ignoring cfs_b->timer_active) and restart the timer explicitly. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru Cc: pjt@google.com Cc: chris.j.arges@canonical.com Cc: gregkh@linuxfoundation.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	sched/dl: Fix race in dl_task_timer()	Kirill Tkhai	1	-1/+9
	Throttled task is still on rq, and it may be moved to other cpu if user is playing with sched_setaffinity(). Therefore, unlocked task_rq() access makes the race. Juri Lelli reports he got this race when dl_bandwidth_enabled() was not set. Other thing, pointed by Peter Zijlstra: "Now I suppose the problem can still actually happen when you change the root domain and trigger a effective affinity change that way". To fix that we do the same as made in __task_rq_lock(). We do not use __task_rq_lock() itself, because it has a useful lockdep check, which is not correct in case of dl_task_timer(). We do not need pi_lock locked here. This case is an exception (PeterZ): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v3.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	sched: Fix sched_policy < 0 comparison	Richard Weinberger	1	-1/+1
	attr.sched_policy is u32, therefore a comparison against < 0 is never true. Fix this by casting sched_policy to int. This issue was reported by coverity CID 1219934. Fixes: dbdb22754fde ("sched: Disallow sched_attr::sched_policy < 0") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05	sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled	Steven Rostedt	1	-3/+4
	As Peter Zijlstra told me, we have the following path: do_exit() exit_itimers() itimer_delete() spin_lock_irqsave(&timer->it_lock, &flags); timer_delete_hook(timer); kc->timer_del(timer) := posix_cpu_timer_del() put_task_struct() __put_task_struct() task_numa_free() spin_lock(&grp->lock); Which means that task_numa_free() can be called with interrupts disabled, which means that we should not be using spin_lock_irq() but spin_lock_irqsave() instead. Otherwise we are enabling interrupts while holding an interrupt unsafe lock! Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner<tglx@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-04	percpu-refcount: fix usage of this_cpu_ops	Sebastian Ott	1	-3/+3
	The percpu-refcount infrastructure uses the underscore variants of this_cpu_ops in order to modify percpu reference counters. (e.g. __this_cpu_inc()). However the underscore variants do not atomically update the percpu variable, instead they may be implemented using read-modify-write semantics (more than one instruction). Therefore it is only safe to use the underscore variant if the context is always the same (process, softirq, or hardirq). Otherwise it is possible to lose updates. This problem is something that Sebastian has seen within the aio subsystem which uses percpu refcounters both in process and softirq context leading to reference counts that never dropped to zeroes; even though the number of "get" and "put" calls matched. Fix this by using the non-underscore this_cpu_ops variant which provides correct per cpu atomic semantics and fixes the corrupted reference counts. Cc: Kent Overstreet <kmo@daterainc.com> Cc: <stable@vger.kernel.org> # v3.11+ Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org> References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-04	perf probe: Fix perf probe to find correct variable DIE	Masami Hiramatsu	1	-2/+5
	Fix perf probe to find correct variable DIE which has location or external instance by tracking down the lexical blocks. Current die_find_variable() expects that the all variable DIEs which has DW_TAG_variable have a location. However, since recent dwarf information may have declaration variable DIEs at the entry of function (subprogram), die_find_variable() returns it. To solve this problem, it must track down the DIE tree to find a DIE which has an actual location or a reference for external instance. e.g. finding a DIE which origin is <0xdc73>; <1><11496>: Abbrev Number: 95 (DW_TAG_subprogram) <11497> DW_AT_abstract_origin: <0xdc42> <1149b> DW_AT_low_pc : 0x1850 [...] <2><114cc>: Abbrev Number: 119 (DW_TAG_variable) <- this is a declaration <114cd> DW_AT_abstract_origin: <0xdc73> <2><114d1>: Abbrev Number: 119 (DW_TAG_variable) [...] <3><115a7>: Abbrev Number: 105 (DW_TAG_lexical_block) <115a8> DW_AT_ranges : 0xaa0 <4><115ac>: Abbrev Number: 96 (DW_TAG_variable) <- this has a location <115ad> DW_AT_abstract_origin: <0xdc73> <115b1> DW_AT_location : 0x486c (location list) Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20140529121930.30879.87092.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-06-04	perf probe: Fix a segfault if asked for variable it doesn't find	Masami Hiramatsu	1	-2/+2
	Fix a segfault bug by asking for variable it doesn't find. Since the convert_variable() didn't handle error code returned from convert_variable_location(), it just passed an incomplete variable field and then a segfault was occurred when formatting the field. This fixes that bug by handling success code correctly in convert_variable(). Other callers of convert_variable_location() are correctly checking the return code. This bug was introduced by following commit. But another hidden erroneous error handling has been there previously (-ENOMEM case). commit 3d918a12a1b3088ac16ff37fa52760639d6e2403 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20140529105232.28251.30447.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-06-04	x86: irq: Get correct available vectors for cpu disable	Yinghai Lu	1	-4/+12
	check_irq_vectors_for_cpu_disable() can overestimate the number of available interrupt vectors, so the check for cpu down succeeds, but the actual cpu removal fails. It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong because the systems vectors are not taken into account. Limit the search to first_system_vector instead of NR_VECTORS. The second indicator for vector availability the used_vectors bitmap is not taken into account at all. So system vectors, e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20), are accounted as available. Add a check for the used_vectors bitmap and do not account vectors which are marked there. [ tglx: Simplified code. Rewrote changelog and code comments. ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-03	iser-target: Fix multi network portal shutdown regression	Nicholas Bellinger	2	-1/+3
	This patch fixes a iser-target specific regression introduced in v3.15-rc6 with: commit 14f4b54fe38f3a8f8392a50b951c8aa43b63687a Author: Sagi Grimberg <sagig@mellanox.com> Date: Tue Apr 29 13:13:47 2014 +0300 Target/iscsi,iser: Avoid accepting transport connections during stop stage where the change to set iscsi_np->enabled = false within iscsit_clear_tpg_np_login_thread() meant that a iscsi_np with two iscsi_tpg_np exports would have it's parent iscsi_np set to a disabled state, even if other iscsi_tpg_np exports still existed. This patch changes iscsit_clear_tpg_np_login_thread() to only set iscsi_np->enabled = false when shutdown = true, and also changes iscsit_del_np() to set iscsi_np->enabled = true when iscsi_np->np_exports is non zero. Cc: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03	iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()	Roland Dreier	1	-39/+31
	In non-leading connection login, iscsi_login_non_zero_tsih_s1() calls iscsi_change_param_value() with the buffer it uses to hold the login PDU, not a temporary buffer. This leads to the login header getting corrupted and login failing for non-leading connections in MC/S. Fix this by adding a wrapper iscsi_change_param_sprintf() that handles the temporary buffer itself to avoid confusion. Also handle sending a reject in case of failure in the wrapper, which lets the calling code get quite a bit smaller and easier to read. Finally, bump the size of the temporary buffer from 32 to 64 bytes to be safe, since "MaxRecvDataSegmentLength=" by itself is 25 bytes; with a trailing NUL, a value >= 1M will lead to a buffer overrun. (This isn't the default but we don't need to run right at the ragged edge here) Reported-by: Santosh Kulkarni <santosh.kulkarni@calsoftinc.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03	iser-target: Add missing target_put_sess_cmd for ImmedateData failure	Nicholas Bellinger	1	-0/+2
	This patch addresses a bug where an early exception for SCSI WRITE with ImmediateData=Yes was missing the target_put_sess_cmd() call to drop the extra se_cmd->cmd_kref reference obtained during the normal iscsit_setup_scsi_cmd() codepath execution. This bug was manifesting itself during session shutdown within isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would end up waiting indefinately for the last se_cmd->cmd_kref put to occur for the failed SCSI WRITE + ImmediateData descriptors. This fix follows what traditional iscsi-target code already does for the same failure case within iscsit_get_immediate_data(). Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03	kernfs: move the last knowledge of sysfs out from kernfs	Jianyu Zhan	4	-12/+20
	There is still one residue of sysfs remaining: the sb_magic SYSFS_MAGIC. However this should be kernfs user specific, so this patch moves it out. Kerrnfs user should specify their magic number while mouting. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-02	net: filter: fix possible memory leak in __sk_prepare_filter()	Leon Yu	1	-1/+6
	__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter: rework/optimize internal BPF interpreter's instruction set) so that it should have uncharged memory once things went wrong. However that work isn't complete. Error is handled only in __sk_migrate_filter() while memory can still leak in the error path right after sk_chk_filter(). Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	net: ec_bhf: Add runtime dependencies	Jean Delvare	1	-0/+1
	The ec_bhf driver is specific to the Beckhoff CX embedded PC series. These are based on Intel x86 CPU. So we can add a dependency on X86, with COMPILE_TEST as an alternative to still allow for broader build-testing. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Darek Marcinkiewicz <reksio@newterm.pl> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	libata: Blacklist queued trim for Crucial M500	Martin K. Petersen	1	-4/+4
	Queued trim only works for some users with MU05 firmware. Revert to blacklisting all firmware versions. Introduced by commit d121f7d0cbb8 ("libata: Update queued trim blacklist for M5x0 drives") which this effectively reverts, while retaining the blacklisting of M550. See https://bugzilla.kernel.org/show_bug.cgi?id=71371 for reports of trouble with MU05 firmware. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-02	tcp: fix cwnd undo on DSACK in F-RTO	Yuchung Cheng	1	-6/+5
	This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	netlink: Only check file credentials for implicit destinations	Eric W. Biederman	2	-4/+10
	It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	ipheth: Add support for iPad 2 and iPad 3	Kristian Evensen	1	-0/+10
	Each iPad model has a different product id, this patch adds support for iPad 2 (pid 0x12a2) and iPad 3 (pid 0x12a6). Note that iPad 2 must be jailbroken and a third-party app must be used for tethering to work. On iPad 3, tethering works out of the box (assuming your ISP is nice). Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	team: fix mtu setting	Jiri Pirko	2	-1/+7
	Now it is not possible to set mtu to team device which has a port enslaved to it. The reason is that when team_change_mtu() calls dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU event is called and team_device_event() returns NOTIFY_BAD forbidding the change. So fix this by returning NOTIFY_DONE here in case team is changing mtu in team_change_mtu(). Introduced-by: 3d249d4c "net: introduce ethernet teaming device" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	net: fix inet_getid() and ipv6_select_ident() bugs	Eric Dumazet	2	-16/+4
	I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI	Aleksander Morgado	1	-1/+0
	This interface is unusable, as the cdc-wdm character device doesn't reply to any QMI command. Also, the out-of-tree Sierra Wireless GobiNet driver fully skips it. Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	net: qmi_wwan: add additional Sierra Wireless QMI devices	Aleksander Morgado	1	-0/+4
	A set of new VID/PIDs retrieved from the out-of-tree GobiNet/GobiSerial Sierra Wireless drivers. Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	bridge: Prevent insertion of FDB entry with disallowed vlan	Toshiaki Makita	3	-2/+37
	br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	netlink: rate-limit leftover bytes warning and print process name	Michal Schmidt	1	-2/+2
	Any process is able to send netlink messages with leftover bytes. Make the warning rate-limited to prevent too much log spam. The warning is supposed to help find userspace bugs, so print the triggering command name to implicate the buggy program. [v2: Use pr_warn_ratelimited instead of printk_ratelimited.] Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02	ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup	Takashi Iwai	1	-2/+2
	The conversion to a fixup table for Replacer model with ALC260 in commit 20f7d928 took the wrong widget NID for COEF setups. Namely, NID 0x1a should have been used instead of NID 0x20, which is the common node for all Realtek codecs but ALC260. Fixes: 20f7d928fa6e ('ALSA: hda/realtek - Replace ALC260 model=replacer with the auto-parser') Cc: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02	ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop	Ronan Marquet	1	-4/+2
	Correcion of wrong fixup entries add in commit ca8f0424 to replace static model quirk for PB V7900 laptop (will model). [note: the removal of ALC260_FIXUP_HP_PIN_0F chain is also needed as a part of the fix; otherwise the pin is set up wrongly as a headphone, and user-space (PulseAudio) may be wrongly trying to detect the jack state -- tiwai] Fixes: ca8f04247eaa ('ALSA: hda/realtek - Add the fixup codes for ALC260 model=will') Signed-off-by: Ronan Marquet <ronan.marquet@orange.fr> Cc: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02	x86/efi: Do not export efi runtime map in case old map	Dave Young	1	-0/+3
	For ioremapped efi memory aka old_map the virt addresses are not persistant across kexec reboot. kexec-tools will read the runtime maps from sysfs then pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause kexec boot failure. To address this issue do not export runtime maps in case efi old_map so userspace can use no efi boot instead. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-02	intel_pstate: Improve initial busy calculation	Doug Smythies	1	-5/+8
	This change makes the busy calculation using 64 bit math which prevents overflow for large values of aperf/mperf. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02	intel_pstate: add sample time scaling	Dirk Brandewie	1	-1/+17
	The PID assumes that samples are of equal time, which for a deferable timers this is not true when the system goes idle. This causes the PID to take a long time to converge to the min P state and depending on the pattern of the idle load can make the P state appear stuck. The hold-off value of three sample times before using the scaling is to give a grace period for applications that have high performance requirements and spend a lot of time idle, The poster child for this behavior is the ffmpeg benchmark in the Phoronix test suite. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02	intel_pstate: Correct rounding in busy calculation	Dirk Brandewie	1	-5/+7
	Changing to fixed point math throughout the busy calculation in commit e66c1768 (Change busy calculation to use fixed point math.) Introduced some inaccuracies by rounding the busy value at two points in the calculation. This change removes roundings and moves the rounding to the output of the PID where the calculations are complete and the value returned as an integer. Fixes: e66c17683746 (intel_pstate: Change busy calculation to use fixed point math.) Reported-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02	intel_pstate: Remove C0 tracking	Dirk Brandewie	1	-12/+1
	Commit fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation) introduced a regression referenced below. The issue with "lockup" after suspend that this commit was addressing is now dealt with in the suspend path. Fixes: fcb6a15c2e7e (intel_pstate: Take core C0 time into account for core busy calculation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581 Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121 Reported-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02	drm/radeon: use the CP DMA on CIK	Christian König	1	-2/+2
	The SDMA sometimes doesn't seem to work reliable. Signed-off-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org
2014-06-02	drm/radeon: sync page table updates	Christian König	1	-2/+7
	Only necessary if we don't use the same engine for buffer moves and table updates. Signed-off-by: Christian König <christian.koenig@amd.com>
2014-06-02	drm/radeon: fix vm buffer size estimation	Christian König	1	-1/+1
	Only relevant if we got VM_BLOCK_SIZE>9, but better save than sorry. Signed-off-by: Christian König <christian.koenig@amd.com>