linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-02-04	perf: Drop module reference on event init failure	Mark Rutland	1	-12/+16
	When initialising an event, perf_init_event will call try_module_get() to ensure that the PMU's module cannot be removed for the lifetime of the event, with __free_event() dropping the reference when the event is finally destroyed. If something fails after the event has been initialised, but before the event is installed, perf_event_alloc will drop the reference on the module. However, if we fail to initialise an event for some reason (e.g. we ask an uncore PMU to perform sampling, and it refuses to initialise the event), we do not drop the refcount. If we try to open such a bogus event without a precise IDR type, we will loop over each PMU in the pmus list, incrementing each of their refcounts without decrementing them. This patch adds a module_put when pmu->event_init(event) fails, ensuring that the refcounts are balanced in failure cases. As the innards of the precise and search based initialisation look very similar, this logic is hoisted out into a new helper function. While the early return for the failed try_module_get is removed from the search case, this is handled by the remaining return when ret is not -ENOENT. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04	perf: Use POLLIN instead of POLL_IN for perf poll data in flag	Jiri Olsa	1	-1/+2
	Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04	perf: Fix put_event() ctx lock	Peter Zijlstra	1	-5/+12
	So what I suspect; but I'm in zombie mode today it seems; is that while I initially thought that it was impossible for ctx to change when refcount dropped to 0, I now suspect its possible. Note that until perf_remove_from_context() the event is still active and visible on the lists. So a concurrent sys_perf_event_open() from another task into this task can race. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: mark.rutland@arm.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04	perf: Fix move_group() order	Peter Zijlstra (Intel)	1	-9/+47
	Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over the weekend: event_sched_out.isra.79+0x2b9/0x2d0 group_sched_out+0x69/0xc0 ctx_sched_out+0x106/0x130 task_ctx_sched_out+0x37/0x70 __perf_install_in_context+0x70/0x1a0 remote_function+0x48/0x60 generic_exec_single+0x15b/0x1d0 smp_call_function_single+0x67/0xa0 task_function_call+0x53/0x80 perf_install_in_context+0x8b/0x110 I think the below should cure this; if we install a group leader it will iterate the (still intact) group list and find its siblings and try and install those too -- even though those still have the old event->ctx -- in the new ctx. Upon installing the first group sibling we'd try and schedule out the group and trigger the above warn. Fix this by installing the group leader last, installing siblings would have no effect, they're not reachable through the group lists and therefore we don't schedule them. Also delay resetting the state until we're absolutely sure the events are quiescent. Reported-by: Jiri Olsa <jolsa@redhat.com> Reported-by: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04	perf: Fix event->ctx locking	Peter Zijlstra	1	-37/+207
	There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04	perf: Add a bit of paranoia	Peter Zijlstra	1	-1/+18
	Add a few WARN()s to catch things that should never happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-01	Linux 3.19-rc7	Linus Torvalds	1	-1/+1

2015-02-01	sched: don't cause task state changes in nested sleep debugging	Linus Torvalds	2	-4/+3
	Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-01	Input: elantech - add more Fujtisu notebooks to force crc_enabled	Rainer Koenig	1	-0/+16
	Add two more Fujitsu LIFEBOOK models that also ship with the Elantech touchpad and don't work with crc_disabled to the quirk list. Signed-off-by: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-01-30	arc: mm: Fix build failure	Guenter Roeck	1	-1/+1
	Fix misspelled define. Fixes: 33692f27597f ("vm: add VM_FAULT_SIGSEGV handling support") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-30	i2c: sh_mobile: terminate DMA reads properly	Wolfram Sang	1	-1/+11
	DMA read requests could miss proper termination, so two more bytes would have been read via PIO overwriting the end of the buffer with wrong data. Make DMA stop handling more readable while we are here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2015-01-30	KVM: x86: check LAPIC presence when building apic_map	Radim Krčmář	1	-0/+3
	We forgot to re-check LAPIC after splitting the loop in commit 173beedc1601 (KVM: x86: Software disabled APIC should still deliver NMIs, 2014-11-02). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 173beedc1601f51dae9d579aa7a414c5aa8f700b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30	drm: fix fb-helper vs MST dangling connector ptrs (v2)	Rob Clark	1	-0/+30
	VT switch back/forth from console to xserver (for example) has potential to go horribly wrong if a dynamic DP MST connector ends up in the saved modeset that is restored when switching back to fbcon. When removing a dynamic connector, don't forget to clean up the saved state. v1: original v2: null out set->fb if no more connectors to avoid making i915 cranky Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1184968 Cc: stable@vger.kernel.org #v3.17+ Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-29	arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault	Marc Zyngier	3	-18/+50
	When handling a fault in stage-2, we need to resync I$ and D$, just to be sure we don't leave any old cache line behind. That's very good, except that we do so using the user address. Under heavy load (swapping like crazy), we may end up in a situation where the page gets mapped in stage-2 while being unmapped from userspace by another CPU. At that point, the DC/IC instructions can generate a fault, which we handle with kvm->mmu_lock held. The box quickly deadlocks, user is unhappy. Instead, perform this invalidation through the kernel mapping, which is guaranteed to be present. The box is much happier, and so am I. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-29	arm/arm64: KVM: Invalidate data cache on unmap	Marc Zyngier	3	-15/+116
	Let's assume a guest has created an uncached mapping, and written to that page. Let's also assume that the host uses a cache-coherent IO subsystem. Let's finally assume that the host is under memory pressure and starts to swap things out. Before this "uncached" page is evicted, we need to make sure we invalidate potential speculated, clean cache lines that are sitting there, or the IO subsystem is going to swap out the cached view, loosing the data that has been written directly into memory. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-29	arm/arm64: KVM: Use set/way op trapping to track the state of the caches	Marc Zyngier	14	-145/+161
	Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-29	perf symbols: Convert lseek + read to pread	Namhyung Kim	1	-5/+1
	When dso_cache__read() is called, it reads data from the given offset using lseek + normal read syscall. It can be combined to a single pread syscall. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-40-git-send-email-namhyung@kernel.org [ Fixed it up when cherry picking it from the multi threaded patchkit ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf tools: Use perf_data_file__fd() consistently	Namhyung Kim	2	-9/+10
	Do not reference file->fd directly since we want hide the implementation details from outside for possible future changes. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-8-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf symbols: Support to read compressed module from build-id cache	Namhyung Kim	1	-5/+8
	The commit c00c48fc6e6e ("perf symbols: Preparation for compressed kernel module support") added support for compressed kernel modules but it only supports system path DSOs. When a dso is read from build-id cache, its filename doesn't end with ".gz" but has build-id. In this case, we should fallback to the original dso->name. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf evsel: Set attr.task bit for a tracking event	Namhyung Kim	1	-0/+1
	The perf_event_attr.task bit is to track task (fork and exit) events but it missed to be set by perf_evsel__config(). While it was not a problem in practice since setting other bits (comm/mmap) ended up being in same result, it'd be good to set it explicitly anyway. The attr->task is to track task related events (fork/exit) only but other meta events like comm and mmap[2] also needs the task events. So setting attr->comm and/or attr->mmap causes the kernel emits the task events anyway. So the attr->task is only meaningful when other bits are off but I'd like to set it for completeness. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-6-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf header: Set header version correctly	Namhyung Kim	1	-1/+1
	When check_magic_endian() is called, it checks the magic number in the perf data file to determine version and endianness. But if it uses a same endian the verison number wasn't updated and makes confusion. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-5-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf record: Show precise number of samples	Namhyung Kim	1	-12/+39
	After perf record finishes, it prints file size and number of samples in the file but this info is wrong since it assumes typical sample size of 24 bytes and divides file size by the value. However as we post-process recorded samples for build-id, it can show correct number like below. If build-id post-processing is not requested just omit the wrong number of samples. $ perf record noploop 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.159 MB perf.data (3989 samples) ] $ perf report --stdio -n # To display the perf.data header info, please use --header/--header-only options. # # Samples: 3K of event 'cycles' # Event count (approx.): 3771330663 # # Overhead Samples Command Shared Object Symbol # ........ ............ ....... ................ .......................... # 99.90% 3982 noploop noploop [.] main 0.09% 1 noploop ld-2.17.so [.] _dl_check_map_versions 0.01% 1 noploop [kernel.vmlinux] [k] setup_arg_pages 0.00% 5 noploop [kernel.vmlinux] [k] intel_pmu_enable_all Reported-by: Milian Wolff <mail@milianw.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf tools: Do not use __perf_session__process_events() directly	Namhyung Kim	3	-10/+6
	It's only used for perf record to process build-id because its file size it's not fixed at this time due to remaining header features. However data offset and size is available so that we can use the perf_session__process_events() once we set the file size as the current offset like for now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	perf callchain: Cache eh/debug frame offset for dwarf unwind	Namhyung Kim	2	-11/+21
	When libunwind tries to resolve callchains it needs to know the offset of .eh_frame_hdr or .debug_frame to access the dso. Since it will always return the same result for a given DSO, just cache the result as an optimization. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1422518843-25818-41-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-29	vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than SIGBUS	Linus Torvalds	1	-1/+1
	The stack guard page error case has long incorrectly caused a SIGBUS rather than a SIGSEGV, but nobody actually noticed until commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page") because that error case was never actually triggered in any normal situations. Now that we actually report the error, people noticed the wrong signal that resulted. So far, only the test suite of libsigsegv seems to have actually cared, but there are real applications that use libsigsegv, so let's not wait for any of those to break. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29	arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device()	Laurent Pinchart	1	-15/+38
	Commit 4bb25789ed28228a ("arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops") moved the setting of the DMA operations from arm_iommu_attach_device() to arch_setup_dma_ops() where the DMA operations to be used are selected based on whether the device is connected to an IOMMU. However, the IOMMU detection scheme requires the IOMMU driver to be ported to the new IOMMU of_xlate API. As no driver has been ported yet, this effectively breaks all IOMMU ARM users that depend on the IOMMU being handled transparently by the DMA mapping API. Fix this by restoring the setting of DMA IOMMU ops in arm_iommu_attach_device() and splitting the rest of the function into a new internal __arm_iommu_attach_device() function, called by arch_setup_dma_ops(). Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2015-01-29	vm: add VM_FAULT_SIGSEGV handling support	Linus Torvalds	30	-7/+63
	The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29	ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds	Magnus Damm	3	-0/+20
	As of commit 9a1091ef0017c40a ("irqchip: gic: Support hierarchy irq domain."), the Lager legacy board support is known to be broken. The IRQ numbers of the GIC are now virtual, and no longer match the hardcoded hardware IRQ numbers in the legacy platform board code. To fix this issue specific to non-multiplatform r8a7790 and Lager: 1) Instantiate the GIC from platform board code and also 2) Skip over the DT arch timer as well as 3) Force delay setup based on DT CPU frequency With these 3 fixes in place interrupts on Lager are now unbroken. Partially based on legacy GIC fix by Geert Uytterhoeven, thanks to him for the initial work. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2015-01-29	ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds	Magnus Damm	2	-0/+27
	As of commit 9a1091ef0017c40a ("irqchip: gic: Support hierarchy irq domain."), the APE6EVM legacy board support is known to be broken. The IRQ numbers of the GIC are now virtual, and no longer match the hardcoded hardware IRQ numbers in the legacy platform board code. To fix this issue specific to non-muliplatform r8a73a4 and APE6EVM: 1) Instantiate the GIC from platform board code and also 2) Skip over the DT arch timer as well as 3) Force delay setup based on DT CPU frequency With these 3 fixes in place interrupts on APE6EVM are now unbroken. Partially based on legacy GIC fix by Geert Uytterhoeven, thanks to him for the initial work. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2015-01-28	ARM: mvebu: don't set the PL310 in I/O coherency mode when I/O coherency is disabled	Thomas Petazzoni	1	-0/+7
	Since commit f2c3c67f00 (merge commit that adds commit "ARM: mvebu: completely disable hardware I/O coherency"), we disable I/O coherency on Armada EBU platforms. However, we continue to initialize the coherency fabric, because this coherency fabric is needed on Armada XP for inter-CPU coherency. Unfortunately, due to this, we also continued to execute the coherency fabric initialization code for Armada 375/38x, which switched the PL310 into I/O coherent mode. This has the effect of disabling the outer cache sync operation: this is needed when I/O coherency is enabled to work around a PCIe/L2 deadlock. But obviously, when I/O coherency is disabled, having the outer cache sync operation is crucial. Therefore, this commit fixes the armada_375_380_coherency_init() so that the PL310 is switched to I/O coherent mode only if I/O coherency is enabled. Without this fix, all devices using DMA are broken on Armada 375/38x. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Cc: <stable@vger.kernel.org> # v3.8+
2015-01-28	perf tools: Provide stub for missing pthread_attr_setaffinity_np	Vineet Gupta	5	-0/+42
	uClibc Linuxthreads.old doesn't support the pthread_attr_setaffinity_np() functioo: ----------------->8----------------------- CC bench/futex-hash.o CC bench/futex-wake.o bench/futex-hash.c: In function 'bench_futex_hash': bench/futex-hash.c:161:3: error: implicit declaration of function 'pthread_attr_setaffinity_np' [-Werror=implicit-function-declaration] ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu); ^ bench/futex-hash.c:161:3: error: nested extern declaration of 'pthread_attr_setaffinity_np' [-Werror=nested-externs] ----------------->8----------------------- So introduce a test to check that and if not available provide a stub. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421156604-30603-6-git-send-email-vgupta@synopsys.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-28	perf evsel: Don't rely on malloc working for sz 0	Vineet Gupta	1	-0/+3
	When running perf on ARC (uClibc based userspace), ran into this issue ------------->8---------------- [ARCLinux]$ ./perf record ls bin etc perf sys debug init perf.data tmp [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (~24 samples) ] [ARCLinux]$ ./perf report incompatible file format (rerun with -v to learn more) ------------->8---------------- The problem happens in the following call stack when zalloc is called with size zero glibc default / uClibc with MALLOC_GLIBC_COMPAT are OK, but not if that config option is not enabled. cmd_report perf_session__new perf_session__open perf_session__read_header read_attr(fd, header, &f_attr) nr_ids = f_attr.ids.size / sizeof(u64); <-- 0 perf_evsel__alloc_id(vsel, 1, nr_ids) zalloc(ncpus * nthreads * sizeof(u64)) <-- 0 header.c: read_attr() (gdb) p *f_attr $17 = { attr = { type = 0, size = 96, config = 0, { sample_period = 4000, sample_freq = 4000 }, ... ids = { offset = 104, size = 0 <------ } } Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421156604-30603-5-git-send-email-vgupta@synopsys.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-28	dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode	Joe Thornber	1	-0/+6
	You can't modify the metadata in these modes. It's better to fail these messages immediately than let the block-manager deny write locks on metadata blocks. Otherwise these failed metadata changes will trigger 'needs_check' to get set in the metadata superblock -- requiring repair using the thin_check utility. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-01-28	dm cache: fix missing ERR_PTR returns and handling	Joe Thornber	1	-4/+5
	Commit 9b1cc9f251 ("dm cache: share cache-metadata object across inactive and active DM tables") mistakenly ignored the use of ERR_PTR returns. Restore missing IS_ERR checks and ERR_PTR returns where appropriate. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-01-28	rbd: drop parent_ref in rbd_dev_unprobe() unconditionally	Ilya Dryomov	1	-4/+1
	This effectively reverts the last hunk of 392a9dad7e77 ("rbd: detect when clone image is flattened"). The problem with parent_overlap != 0 condition is that it's possible and completely valid to have an image with parent_overlap == 0 whose parent state needs to be cleaned up on unmap. The next commit, which drops the "clone image now standalone" logic, opens up another window of opportunity to hit this, but even without it # cat parent-ref.sh #!/bin/bash rbd create --image-format 2 --size 1 foo rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar rbd resize --allow-shrink --size 0 bar rbd resize --size 1 bar DEV=$(rbd map bar) rbd unmap $DEV leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client hanging around. My thinking behind calling rbd_dev_parent_put() unconditionally is that there shouldn't be any requests in flight at that point in time as we are deep into unmap sequence. Hence, even if rbd_dev_unparent() caused by flatten is delayed by in-flight requests, it will have finished by the time we reach rbd_dev_unprobe() caused by unmap, thus turning unconditional rbd_dev_parent_put() into a no-op. Fixes: http://tracker.ceph.com/issues/10352 Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-01-28	rbd: fix rbd_dev_parent_get() when parent_overlap == 0	Ilya Dryomov	1	-14/+6
	The comment for rbd_dev_parent_get() said * We must get the reference before checking for the overlap to * coordinate properly with zeroing the parent overlap in * rbd_dev_v2_parent_info() when an image gets flattened. We * drop it again if there is no overlap. but the "drop it again if there is no overlap" part was missing from the implementation. This lead to absurd parent_ref values for images with parent_overlap == 0, as parent_ref was incremented for each img_request and virtually never decremented. Fix this by leveraging the fact that refresh path calls rbd_dev_v2_parent_info() under header_rwsem and use it for read in rbd_dev_parent_get(), instead of messing around with atomics. Get rid of barriers in rbd_dev_v2_parent_info() while at it - I don't see what they'd pair with now and I suspect we are in a pretty miserable situation as far as proper locking goes regardless. Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-01-28	perf: Tighten (and fix) the grouping condition	Peter Zijlstra	2	-8/+13
	The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28	perf/x86/intel: Add model number for Airmont	Kan Liang	1	-0/+1
	Intel Airmont supports the same architectural and non-architectural performance monitoring events as Silvermont. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421913053-99803-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28	perf/rapl: Fix crash in rapl_scale()	Stephane Eranian	1	-1/+1
	This patch fixes a systematic crash in rapl_scale() due to an invalid pointer. The bug was introduced by commit: 89cbc76768c2 ("x86: Replace __get_cpu_var uses") The fix is simple. Just put the parenthesis where it needs to be, i.e., around rapl_pmu. To my surprise, the compiler was not complaining about passing an integer instead of a pointer. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Fixes: 89cbc76768c2 ("x86: Replace __get_cpu_var uses") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: cl@linux.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150122203834.GA10228@thinkpad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28	perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization	Kan Liang	2	-15/+12
	There were some issues about the uncore driver tried to access non-existing boxes, which caused boot crashes. These issues have been all fixed. But we should avoid boot failures if that ever happens again. This patch intends to prevent this kind of potential issues. It moves uncore_box_init out of driver initialization. The box will be initialized when it's first enabled. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421729665-5912-1-git-send-email-kan.liang@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28	quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units	Jan Kara	8	-195/+318
	Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-28	udf: Release preallocation on last writeable close	Jan Kara	1	-1/+1
	Commit 6fb1ca92a640 "udf: Fix race between write(2) and close(2)" changed the condition when preallocation is released. The idea was that we don't want to release the preallocation for an inode on close when there are other writeable file descriptors for the inode. However the condition was written in the opposite way so we released preallocation only if there were other writeable file descriptors. Fix the problem by changing the condition properly. CC: stable@vger.kernel.org Fixes: 6fb1ca92a6409a9d5b0696447cd4997bc9aaf5a2 Reported-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-27	btrfs: fix raid56 scrub failed in xfstests btrfs/072	Gui Hecheng	1	-0/+2
	The xfstests btrfs/072 reports uncorrectable read errors in dmesg, because scrub forgets to use commit_root for parity scrub routine and scrub attempts to scrub those extents items whose contents are not fully on disk. To fix it, we just add the @search_commit_root flag back. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-27	net: don't OOPS on socket aio	Christoph Hellwig	1	-3/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	stmmac: prevent probe drivers to crash kernel	Andy Shevchenko	1	-1/+4
	In the case when alloc_netdev fails we return NULL to a caller. But there is no check for NULL in the probe drivers. This patch changes NULL to an error pointer. The function description is amended to reflect what we may get returned. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	bnx2x: fix napi poll return value for repoll	Govindarajulu Varadarajan	1	-1/+1
	With the commit d75b1ade567ffab ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. When in busy_poll is we return 0 in napi_poll. We should return budget. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too	Hannes Frederic Sowa	1	-19/+26
	Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: \| [root@rhel7-5 lkundrak]# sh -x lal \| + ip link add dev0 type dummy \| + ip link set dev0 up \| + ip link add dev1 type dummy \| + ip link set dev1 up \| + ip addr add 2001:db8:8086::2/64 dev dev0 \| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 \| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 \| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 \| + ip link del dev0 type dummy \| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 \| \| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	sh_eth: Fix DMA-API usage for RX buffers	Ben Hutchings	1	-11/+23
	- Use the return value of dma_map_single(), rather than calling virt_to_page() separately - Check for mapping failue - Call dma_unmap_single() rather than dma_sync_single_for_cpu() Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	sh_eth: Check for DMA mapping errors on transmit	Ben Hutchings	1	-0/+4
	dma_map_single() may fail if an IOMMU or swiotlb is in use, so we need to check for this. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27	sh_eth: Ensure DMA engines are stopped before freeing buffers	Ben Hutchings	1	-7/+32
	Currently we try to clear EDRRR and EDTRR and immediately continue to free buffers. This is unsafe because: - In general, register writes are not serialised with DMA, so we still have to wait for DMA to complete somehow - The R8A7790 (R-Car H2) manual states that the TX running flag cannot be cleared by writing to EDTRR - The same manual states that clearing the RX running flag only stops RX DMA at the next packet boundary I applied this patch to the driver to detect DMA writes to freed buffers: > --- a/drivers/net/ethernet/renesas/sh_eth.c > +++ b/drivers/net/ethernet/renesas/sh_eth.c > @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device ndev) > / Free Rx skb ringbuffer */ > if (mdp->rx_skbuff) { > for (i = 0; i < mdp->num_rx_ring; i++) > + memcpy(mdp->rx_skbuff[i]->data, > + "Hello, world", 12); > + msleep(100); > + for (i = 0; i < mdp->num_rx_ring; i++) { > + WARN_ON(memcmp(mdp->rx_skbuff[i]->data, > + "Hello, world", 12)); > dev_kfree_skb(mdp->rx_skbuff[i]); > + } > } > kfree(mdp->rx_skbuff); > mdp->rx_skbuff = NULL; then ran the loop: while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done and 'ping -f' toward the sh_eth port from another machine. The warning fired several times a minute. To fix these issues: - Deactivate all TX descriptors rather than writing to EDTRR - As there seems to be no way of telling when RX DMA is stopped, perform a soft reset to ensure that both DMA enginess are stopped - To reduce the possibility of the reset truncating a transmitted frame, disable egress and wait a reasonable time to reach a packet boundary before resetting - Update statistics before resetting (The 'reasonable time' does not allow for CS/CD in half-duplex mode, but half-duplex no longer seems reasonable!) Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>