aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/reboot.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2013-09-25powerpc/pseries: Do not start secondaries in Open FirmwareBenjamin Herrenschmidt2-10/+37
Starting secondary CPUs early on from Open Firmware and placing them in a holding spin loop slows down the boot process significantly under some hypervisors such as KVM. This is also unnecessary when RTAS supports querying the CPU state So let's not do it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25powerpc/zImage: make the "OF" wrapper support ePAPR bootBenjamin Herrenschmidt5-9/+33
This makes the "OF" zImage wrapper (zImage.pseries, zImage.pmac, zImage.maple) work if booted via a flat device-tree (ePAPR boot mode), and thus potentially usable with kexec. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25powerpc: Remove ksp_limit on ppc64Benjamin Herrenschmidt6-18/+23
We've been keeping that field in thread_struct for a while, it contains the "limit" of the current stack pointer and is meant to be used for detecting stack overflows. It has a few problems however: - First, it was never actually *used* on 64-bit. Set and updated but not actually exploited - When switching stack to/from irq and softirq stacks, it's update is racy unless we hard disable interrupts, which is costly. This is fine on 32-bit as we don't soft-disable there but not on 64-bit. Thus rather than fixing 2 in order to implement 1 in some hypothetical future, let's remove the code completely from 64-bit. In order to avoid a clutter of ifdef's, we remove the updates from C code completely during interrupt stack switching, and instead maintain it from the asm helper that is used to do the stack switching in the first place. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25powerpc/irq: Run softirqs off the top of the irq stackBenjamin Herrenschmidt4-65/+62
Nowadays, irq_exit() calls __do_softirq() pretty much directly instead of calling do_softirq() which switches to the decicated softirq stack. This has lead to observed stack overflows on powerpc since we call irq_enter() and irq_exit() outside of the scope that switches to the irq stack. This fixes it by moving the stack switching up a level, making irq_enter() and irq_exit() run off the irq stack. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-24mm: Place preemption point in do_mlockall() loopPaul E. McKenney1-0/+1
There is a loop in do_mlockall() that lacks a preemption point, which means that the following can happen on non-preemptible builds of the kernel. Dave Jones reports: "My fuzz tester keeps hitting this. Every instance shows the non-irq stack came in from mlockall. I'm only seeing this on one box, but that has more ram (8gb) than my other machines, which might explain it. INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0) sending NMI to all CPUs: NMI backtrace for cpu 3 CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32 Call Trace: lru_add_drain_all+0x15/0x20 SyS_mlockall+0xa5/0x1a0 tracesys+0xdd/0xe2" This commit addresses this problem by inserting the required preemption point. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Michel Lespinasse <walken@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24dts: Fix misspelling of SynopsysDinh Nguyen4-15/+15
s/Synopsis/Synopsys s/synopsis/synopsys Signed-off-by: Dinh Nguyen <dinguyen@altera.com> Cc: Pavel Machek <pavel@denx.de> CC: Arnd Bergmann <arnd@arndb.de> CC: Olof Johansson <olof@lixom.net> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Chris Ball <cjb@laptop.org> Cc: Jaehoon Chung <jh80.chung@samsung.com> Cc: Seungwon Jeon <tgih.jun@samsung.com> Cc: Tomasz Figa <tomasz.figa@gmail.com> Cc: devicetree@vger.kernel.org Cc: linux-mmc@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
2013-09-24of: clean-up ifdefs in of_irq.hRob Herring1-12/+8
Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the amount ifdef'ed code. This fixes some build warnings when CONFIG_OF is not enabled (seen on i386 and x86_64): include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default] include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default] Compile tested on i386, sparc and arm. Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Grant Likely <grant.likely@linaro.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
2013-09-24openrisc: clean-up prom.hRob Herring1-44/+0
Clean-up some copy/paste declarations that are not necessary. All the functions either don't exist or are already declared in other headers. This is needed in preparation of of_irq.h clean-up. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: linux@lists.openrisc.net
2013-09-25cpufreq: exynos5440: Fix potential NULL pointer dereferenceSachin Kamat1-1/+1
If 'dvfs_info' is NULL (due to devm_kzalloc failure) the failure error message would try to dereference it. Use 'pdev' instead. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()Viresh Kumar1-0/+3
cpufreq_get() can be called from external drivers which might not be aware if cpufreq driver is registered or not. And so we should actually check if cpufreq driver is registered or not and also if cpufreq is active or disabled, at the beginning of cpufreq_get(). Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy). Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25acpi-cpufreq: skip loading acpi_cpufreq after intel_pstateYinghai Lu1-0/+4
If the hw supports intel_pstate and acpi_cpufreq, intel_pstate will get loaded first. acpi_cpufreq_init() will call acpi_cpufreq_early_init() and that will allocate perf data and init those perf data in ACPI core, (that will cover all CPUs). But later it will free them as cpufreq_register_driver(acpi_cpufreq) will fail as intel_pstate is already registered Use cpufreq_get_current_driver() to check if we can skip the acpi_cpufreq loading. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler()Lv Zheng1-10/+14
This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24MAINTAINERS: update mach-bcm related email addressChristian Daudt1-1/+2
Update email address on mach-bcm + drivers for Broadcom mobile SoCs. Signed-off-by: Christian Daudt <csd@broadcom.com> Cc: Olof Johansson <olof@lixom.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24checkpatch: make extern in .h prototypes quieterJoe Perches1-2/+2
The use of extern in .h files is a bit contentious. Make the warning be emitted only when --strict is used on the command line. Signed-off-by: Joe Perches <joe@perches.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24cciss: fix info leak in cciss_ioctl32_passthru()Dan Carpenter1-0/+1
The arg64 struct has a hole after ->buf_size which isn't cleared. Or if any of the calls to copy_from_user() fail then that would cause an information leak as well. This was assigned CVE-2013-2147. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24cpqarray: fix info leak in ida_locked_ioctl()Dan Carpenter1-0/+1
The pciinfo struct has a two byte hole after ->dev_fn so stack information could be leaked to the user. This was assigned CVE-2013-2147. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24kernel/reboot.c: re-enable the function of variable reboot_defaultChuansheng Liu1-1/+8
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") did some cleanup for reboot= command line, but it made the reboot_default inoperative. The default value of variable reboot_default should be 1, and if command line reboot= is not set, system will use the default reboot mode. [akpm@linux-foundation.org: fix comment layout] Signed-off-by: Li Fei <fei.li@intel.com> Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Acked-by: Robin Holt <robinmholt@linux.com> Cc: <stable@vger.kernel.org> [3.11.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24audit: fix endless wait in audit_log_start()Konstantin Khlebnikov1-2/+3
After commit 829199197a43 ("kernel/audit.c: avoid negative sleep durations") audit emitters will block forever if userspace daemon cannot handle backlog. After the timeout the waiting loop turns into busy loop and runs until daemon dies or returns back to work. This is a minimal patch for that bug. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Chuck Anderson <chuck.anderson@oracle.com> Cc: Dan Duval <dan.duval@oracle.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"Andrew Morton3-60/+175
Revert commit 3b38722efd9f ("memcg, vmscan: integrate soft reclaim tighter with zone shrinking code") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg: get rid of soft-limit tree infrastructure"Andrew Morton1-2/+263
Revert commit e883110aad71 ("memcg: get rid of soft-limit tree infrastructure") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"Andrew Morton3-15/+9
Revert commit a5b7c87f9207 ("vmscan, memcg: do softlimit reclaim also for targeted reclaim") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg: enhance memcg iterator to support predicates"Andrew Morton3-103/+32
Revert commit de57780dc659 ("memcg: enhance memcg iterator to support predicates") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg: track children in soft limit excess to improve soft limit"Andrew Morton1-71/+0
Revert commit 7d910c054be4 ("memcg: track children in soft limit excess to improve soft limit") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything"Andrew Morton2-8/+2
Revert commit e839b6a1c8d0 ("memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg: track all children over limit in the root"Andrew Morton1-9/+0
Revert commit 1be171d60bdd ("memcg: track all children over limit in the root") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24revert "memcg, vmscan: do not fall into reclaim-all pass too quickly"Andrew Morton1-17/+2
Revert commit e975de998b96 ("memcg, vmscan: do not fall into reclaim-all pass too quickly") I merged this prematurely - Michal and Johannes still disagree about the overall design direction and the future remains unclear. Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volumeGoldwyn Rodrigues1-1/+1
While printing 32-bit node numbers, an 8-byte string is not enough. Increase the size of the string to 12 chars. This got left out in commit 49fa8140e487 ("fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers"). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24watchdog: update watchdog_thresh properlyMichal Hocko2-3/+56
watchdog_tresh controls how often nmi perf event counter checks per-cpu hrtimer_interrupts counter and blows up if the counter hasn't changed since the last check. The counter is updated by per-cpu watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh period which guarantees that hrtimer is scheduled 2 times per the main period. Both hrtimer and perf event are started together when the watchdog is enabled. So far so good. But... But what happens when watchdog_thresh is updated from sysctl handler? proc_dowatchdog will set a new sampling period and hrtimer callback (watchdog_timer_fn) will use the new value in the next round. The problem, however, is that nobody tells the perf event that the sampling period has changed so it is ticking with the period configured when it has been set up. This might result in an ear ripping dissonance between perf and hrtimer parts if the watchdog_thresh is increased. And even worse it might lead to KABOOM if the watchdog is configured to panic on such a spurious lockup. This patch fixes the issue by updating both nmi perf even counter and hrtimers if the threshold value has changed. The nmi one is disabled and then reinitialized from scratch. This has an unpleasant side effect that the allocation of the new event might fail theoretically so the hard lockup detector would be disabled for such cpus. On the other hand such a memory allocation failure is very unlikely because the original event is deallocated right before. It would be much nicer if we just changed perf event period but there doesn't seem to be any API to do that right now. It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we cannot use on_each_cpu() and do the same thing from the per-cpu context. The update from the current CPU should be safe because perf_event_disable removes the event atomically before it clears the per-cpu watchdog_ev so it cannot change anything under running handler feet. The hrtimer is simply restarted (thanks to Don Zickus who has pointed this out) if it is queued because we cannot rely it will fire&adopt to the new sampling period before a new nmi event triggers (when the treshold is decreased). [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place] Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Fabio Estevam <festevam@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24watchdog: update watchdog attributes atomicallyMichal Hocko1-2/+5
proc_dowatchdog doesn't synchronize multiple callers which might lead to confusion when two parallel callers might confuse watchdog_enable_all_cpus resp watchdog_disable_all_cpus (eg watchdog gets enabled even if watchdog_thresh was set to 0 already). This patch adds a local mutex which synchronizes callers to the sysctl handler. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix flushes in writeback modeKent Overstreet1-6/+9
In writeback mode, when we get a cache flush we need to make sure we issue a flush to the backing device. The code for sending down an extra flush was wrong - by cloning the bio we were probably getting flags that didn't make sense for a bare flush, and also the old code was firing for FUA bios, for which we don't need to send a flush to the backing device. This was causing data corruption somehow - the mechanism was never determined, but this patch fixes it for the users that were seeing it. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix for handling overlapping extents when reading in a btree nodeKent Overstreet1-11/+28
btree_sort_fixup() was overly clever, because it was trying to avoid pulling a key off the btree iterator in more than one place. This led to a really obscure bug where we'd break early from the loop in btree_sort_fixup() if the current key overlapped with keys in more than one older set, and the next key it overlapped with was zero size. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix a shrinker deadlockKent Overstreet1-1/+1
GFP_NOIO means we could be getting called recursively - mca_alloc() -> mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix a dumb CPU spinning bug in writebackKent Overstreet1-2/+1
schedule_timeout() != schedule_timeout_uninterruptible() Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix a flush/fua performance bugKent Overstreet1-0/+1
bch_journal_meta() was missing the flush to make the journal write actually go down (instead of waiting up to journal_delay_ms)... Whoops Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix a writeback performance regressionKent Overstreet4-30/+43
Background writeback works by scanning the btree for dirty data and adding those keys into a fixed size buffer, then for each dirty key in the keybuf writing it to the backing device. When read_dirty() finishes and it's time to scan for more dirty data, we need to wait for the outstanding writeback IO to finish - they still take up slots in the keybuf (so that foreground writes can check for them to avoid races) - without that wait, we'll continually rescan when we'll be able to add at most a key or two to the keybuf, and that takes locks that starves foreground IO. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Correct printf()-style format length modifierGeert Uytterhoeven1-1/+1
Fix drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’: drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix for when no journal entries are foundKent Overstreet1-12/+18
The journal replay code didn't handle this case, causing it to go into an infinite loop... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Strip endline when writing the label through sysfsGabriel de Perthuis1-2/+7
sysfs attributes with unusual characters have crappy failure modes in Squeeze (udev 164); later versions of udev are unaffected. This should make these characters more unusual. Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24bcache: Fix a dumb journal discard bugKent Overstreet1-1/+1
That switch statement was obviously wrong, leading to some sort of weird spinning on rare occasion with discards enabled... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24block: Fix bio_copy_data()Kent Overstreet1-2/+2
The memcpy() in bio_copy_data() was using the wrong offset vars, leading to data corruption in weird unusual setups. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-stable <stable@vger.kernel.org> # >= v3.9 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24xen/balloon: don't alloc page while non-preemptibleDavid Vrabel1-12/+11
get_balloon_scratch_page() disables preemption so we cannot call alloc_page() in between get/put_balloon_scratch_page(). Shuffle bits around in decrease_reservation() to avoid this. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-24xen: Do not enable spinlocks before jump_label_init() has executedKonrad Rzeszutek Wilk1-2/+24
xen_init_spinlocks() currently calls static_key_slow_inc() before jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is the case) the effect of this static_key_slow_inc() is deferred until after jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in which case the key is set immediately. Thus, depending on the value of config option, we may observe different behavior. In addition, when we come to __jump_label_transform() from jump_label_init(), the key (paravirt_ticketlocks_enabled) is already enabled. On processors where ideal_nop is not the same as default_nop this will cause a BUG() since it is expected that before a key is enabled the latter is replaced by the former during initialization. To address this problem we need to move static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called after jump_label_init(). We also need to make sure that this is done before other cpus start to boot. early_initcall appears to be a good place to do so. (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops need to be set before alternative_instructions() runs.) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added extra comments in the code] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-24tpm: xen-tpmfront: Remove the locality sysfs attributeJason Gunthorpe1-29/+0
Upon deeper review it was agreed to remove the driver-unique 'locality' sysfs attribute before it is present in a released kernel. The attribute was introduced in e2683957fb268c6b29316fd9e7191e13239a30a5 during the 3.12 merge window, so this patch needs to go in before 3.12 is released. The hope is to have a well defined locality API that all the other locality aware drivers can use, perhaps in 3.13. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
2013-09-24tpm: xen-tpmfront: Fix default durationsJason Gunthorpe1-7/+0
All the default durations were being set to 10 minutes which is way too long for the timeouts. Normal values for the longest duration are around 5 mins, and short duration ar around .5s. Further, these are just the default, tpm_get_timeouts will set them to values from the TPM (or throw an error). Just remove them. Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-24drm/i915: preserve pipe A quirk in i9xx_set_pipeconfDaniel Vetter1-0/+4
This regression has been introduced in commit 9f11a9e4e50006b615ba94722dfc33ced89664cf Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jun 13 00:54:58 2013 +0200 drm/i915: set up PIPECONF explicitly for i9xx/vlv platforms Ville brough up the idea that this is just the pipe A quirk gone wrong. Note that after resume the bios might or might not have enabled pipe A already. We have a bit of magic to make sure that on resume we set up a decent mode for pipe A, but I fear if I just smash pipe A to always on we'd enable it in a bogus state and hang the hw. Hence the readback. v2: Clarify the logic a bit as suggested by Chris. Also amend the commit message to clarify why we don't unconditionally enable the pipe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66462 References: https://lkml.org/lkml/2013/8/26/238 Cc: Meelis Roos <mroos@ut.ee> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Use |= instead of = as suggested by Chris.] Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24drm/i915/tv: clear adjusted_mode.flagsDaniel Vetter1-0/+8
The native TV encoder has it's own flags to adjust sync modes and enabled interlaced modes which are totally irrelevant for the adjusted mode. This worked out nicely since the input modes used by both the load detect code and reported in the ->get_modes callbacks all have no flags set, and we also don't fill out any of them in the ->get_config callback. This changed with the additional sanitation done with commit 2960bc9cceecb5d556ce1c07656a6609e2f7e8b0 Author: Imre Deak <imre.deak@intel.com> Date: Tue Jul 30 13:36:32 2013 +0300 drm/i915: make user mode sync polarity setting explicit sinc now the "no flags at all" state wouldn't fit through core code any more. So fix this up again by explicitly clearing the flags in the ->compute_config callback. Aside: We have zero checking in place to make sure that the requested mode is indeed the right input mode we want for the selected TV mode. So we'll happily fall over if userspace tries to pull us. But that's definitely work for a different patch series. So just add a FIXME comment for now. Reported-by: Knut Petersen <Knut_Petersen@t-online.de> Cc: Knut Petersen <Knut_Petersen@t-online.de> Cc: Imre Deak <imre.deak@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Knut Petersen <Knut_Petersen@t-online.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFERJani Nikula1-1/+12
There is no clear cut rules or specs for the retry interval, as there are many factors that affect overall response time. Increase the interval, and even more so on branch devices which may have limited i2c bit rates. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=60263 Tested-by: Nicolas Suzor <nic@suzor.com> Reviewed-by: Todd Previte <tprevite@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24xfs: log recovery lsn ordering needs uuid checkDave Chinner1-14/+59
After a fair number of xfstests runs, xfs/182 started to fail regularly with a corrupted directory - a directory read verifier was failing after recovery because it found a block with a XARM magic number (remote attribute block) rather than a directory data block. The first time I saw this repeated failure I did /something/ and the problem went away, so I was never able to find the underlying problem. Test xfs/182 failed again today, and I found the root cause before I did /something else/ that made it go away. Tracing indicated that the block in question was being correctly logged, the log was being flushed by sync, but the buffer was not being written back before the shutdown occurred. Tracing also indicated that log recovery was also reading the block, but then never writing it before log recovery invalidated the cache, indicating that it was not modified by log recovery. More detailed analysis of the corpse indicated that the filesystem had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which indicated it was a block from an older filesystem. The reason that log recovery didn't replay it was that the LSN in the XARM block was larger than the LSN of the transaction being replayed, and so the block was not overwritten by log recovery. Hence, log recovery cant blindly trust the magic number and LSN in the block - it must verify that it belongs to the filesystem being recovered before using the LSN. i.e. if the UUIDs don't match, we need to unconditionally recovery the change held in the log. This patch was first tested on a block device that was repeatedly causing xfs/182 to fail with the same failure on the same block with the same directory read corruption signature (i.e. XARM block). It did not fail, and hasn't failed since. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24xfs: fix XFS_IOC_FREE_EOFBLOCKS definitionDave Chinner1-1/+1
It uses a kernel internal structure in it's definition rather than the user visible structure that is passed to the ioctl. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24xfs: asserting lock not held during freeing not validDave Chinner1-5/+4
When we free an inode, we do so via RCU. As an RCU lookup can occur at any time before we free an inode, and that lookup takes the inode flags lock, we cannot safely assert that the flags lock is not held just before marking it dead and running call_rcu() to free the inode. We check on allocation of a new inode structre that the lock is not held, so we still have protection against locks being leaked and hence not correctly initialised when allocated out of the slab. Hence just remove the assert... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>