linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-03-06	fs/compat: optional preadv64/pwrite64 compat system calls	Heiko Carstens	1	-0/+2
	The preadv64/pwrite64 have been implemented for the x32 ABI, in order to allow passing 64 bit arguments from user space without splitting them into two 32 bit parameters, like it would be necessary for usual compat tasks. Howevert these two system calls are only being used for the x32 ABI, so add __ARCH_WANT_COMPAT defines for these two compat syscalls and make these two only visible for x86. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: partial parameter conversion within syscall wrappers	Heiko Carstens	1	-2/+17
	Parameter conversion within the system call wrappers is only needed for parameters which differ in size and have a size of eight bytes on 64 bit. For system call parameters with a size of less than eight byte the called system call itself will perform parameter conversion anyway. So we can save the double conversion of e.g. int parameters. The only types which need to be converted are therefore pointer and (unsigned) long parameters. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: automatic zero, sign and pointer conversion of syscalls	Heiko Carstens	1	-59/+75
	Instead of explicitly changing compat system call parameters from e.g. unsigned long to compat_ulong_t let the COMPAT_SYSCALL_WRAP macros automatically detect (unsigned) long parameters and zero and sign extend them automatically. The resulting binary is completely identical. In addition add a sys_[system call name] prototype for each system call wrapper. This will cause compile errors if the prototype does not match the prototype in include/linux/syscall.h. Therefore we should now always get the correct zero and sign extension of system call parameters. Pointers are handled like before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: add sync_file_range and fallocate compat syscalls	Heiko Carstens	4	-20/+19
	The compat syscall wrappers for sync_file_range and fallocate merged 32 bit parameters into 64 bit parameters. Therefore they did more than just the usual zero and/or sign extension of system call parameters. So convert these two wrappers to full s390 specific compat sytem calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 15	Heiko Carstens	4	-86/+25
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 14	Heiko Carstens	3	-60/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 13	Heiko Carstens	3	-77/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 12	Heiko Carstens	3	-76/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 11	Heiko Carstens	4	-58/+21
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 10	Heiko Carstens	3	-70/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 09	Heiko Carstens	3	-67/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 08	Heiko Carstens	3	-66/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 07	Heiko Carstens	3	-76/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 06	Heiko Carstens	3	-61/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 05	Heiko Carstens	4	-65/+23
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 04	Heiko Carstens	3	-67/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 03	Heiko Carstens	3	-65/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 02	Heiko Carstens	3	-55/+20
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert system call wrappers to C part 01	Heiko Carstens	4	-68/+48
	Introduce a new compat_wrap.c file which contains the s390 specific compat system call wrappers. The s390 specific system call wrappers only perform sign, zero and pointer conversion of system call arguments before actually calling the non-compat system call. Therefore introduce COMPAT_SYSCALL_WRAPx macros which generate C code that is nearly identical to the assembly code. This has the advantage that the compile will generate correct code, and we avoid the frequent copy-paste errors seen in the compat_wrapper.S file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 7	Heiko Carstens	4	-42/+13
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 6	Heiko Carstens	4	-43/+16
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 5	Heiko Carstens	4	-60/+21
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 4	Heiko Carstens	4	-34/+16
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 3	Heiko Carstens	4	-39/+16
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 2	Heiko Carstens	4	-44/+18
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 1	Heiko Carstens	4	-42/+17
	Convert s390 specific system calls to to the new COMPAT_SYSCALL_DEFINE macro. This allows us to get rid of the assembly compat wrappers. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04	compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64	Heiko Carstens	3	-1/+2
	For architecture dependent compat syscalls in common code an architecture must define something like __ARCH_WANT_<WHATEVER> if it wants to use the code. This however is not true for compat_sys_getdents64 for which architectures must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code. This leads to the situation where all architectures, except mips, get the compat code but only x86_64, arm64 and the generic syscall architectures actually use it. So invert the logic, so that architectures actively must do something to get the compat code. This way a couple of architectures get rid of otherwise dead code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-02	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	1	-0/+3
	Pull perf fixes from Ingo Molnar: "Misc fixes, most of them on the tooling side" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix strict alias issue for find_first_bit perf tools: fix BFD detection on opensuse perf: Fix hotplug splat perf/x86: Fix event scheduling perf symbols: Destroy unused symsrcs perf annotate: Check availability of annotate when processing samples
2014-03-01	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	2	-4/+7
	Pull x86 fixes from Peter Anvin: "The VMCOREINFO patch I'll pushing for this release to avoid having a release with kASLR and but without that information. I was hoping to include the FPU patches from Suresh, but ran into a problem (see other thread); will try to make them happen next week" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: add missed "static" declarations x86, kaslr: export offset in VMCOREINFO ELF notes
2014-02-28	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	6	-6/+40
	Pull KVM fixes from Paolo Bonzini: "Three x86 fixes and one for ARM/ARM64. In particular, nested virtualization on Intel is broken in 3.13 and fixed by this pull request" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm, vmx: Really fix lazy FPU on nested guest kvm: x86: fix emulator buffer overflow (CVE-2014-0049) arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT KVM: MMU: drop read-only large sptes when creating lower level sptes
2014-02-28	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux	Linus Torvalds	3	-6/+18
	Pull ARM64 fixes from Catalin Marinas: - !CONFIG_SMP build fix - pte bit testing macros conversion fix (int truncates top bits of long) - stack unwinding PC calculation fix * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix !CONFIG_SMP kernel build arm64: mm: Add double logical invert to pte accessors ARM64: unwind: Fix PC calculation
2014-02-28	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc	Linus Torvalds	10	-178/+219
	Pull powerpc fixes from Ben Herrenschmidt: "Here are a few more powerpc fixes for 3.14. Most of these are also CC'ed to stable and fix bugs in new functionality introduced in the last 2 or 3 versions" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/powernv: Fix indirect XSCOM unmangling powerpc/powernv: Fix opal_xscom_{read,write} prototype powerpc/powernv: Refactor PHB diag-data dump powerpc/powernv: Dump PHB diag-data immediately powerpc: Increase stack redzone for 64-bit userspace to 512 bytes powerpc/ftrace: bugfix for test_24bit_addr powerpc/crashdump : Fix page frame number check in copy_oldmem_page powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
2014-02-28	arm64: Fix !CONFIG_SMP kernel build	Catalin Marinas	1	-0/+8
	Commit fb4a96029c8a (arm64: kernel: fix per-cpu offset restore on resume) uses per_cpu_offset() unconditionally during CPU wakeup, however, this is only defined for the SMP case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Dave P Martin <Dave.Martin@arm.com>
2014-02-28	arm64: mm: Add double logical invert to pte accessors	Steve Capper	1	-5/+5
	Page table entries on ARM64 are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(pte), 1); where pte_dirty(pte) is downcast to an int. This patch adds a double logical invert to all the pte_ accessors to ensure predictable downcasting. Signed-off-by: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-28	powerpc/powernv: Fix indirect XSCOM unmangling	Benjamin Herrenschmidt	1	-9/+12
	We need to unmangle the full address, not just the register number, and we also need to support the real indirect bit being set for in-kernel uses. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13]
2014-02-28	powerpc/powernv: Fix opal_xscom_{read,write} prototype	Benjamin Herrenschmidt	1	-2/+2
	The OPAL firmware functions opal_xscom_read and opal_xscom_write take a 64-bit argument for the XSCOM (PCB) address in order to support the indirect mode on P8. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13]
2014-02-28	powerpc/powernv: Refactor PHB diag-data dump	Gavin Shan	1	-95/+125
	As Ben suggested, the patch prints PHB diag-data with multiple fields in one line and omits the line if the fields of that line are all zero. With the patch applied, the PHB3 diag-data dump looks like: PHB3 PHB#3 Diag-data (Version: 1) brdgCtl: 00000002 RootSts: 0000000f 00400000 b0830008 00100147 00002000 nFir: 0000000000000000 0030006e00000000 0000000000000000 PhbSts: 0000001c00000000 0000000000000000 Lem: 0000000000100000 42498e327f502eae 0000000000000000 InAErr: 8000000000000000 8000000000000000 0402030000000000 0000000000000000 PE[ 8] A/B: 8480002b00000000 8000000000000000 [ The current diag data is so big that it overflows the printk buffer pretty quickly in cases when we get a handful of errors at once which can happen. --BenH ] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28	powerpc/powernv: Dump PHB diag-data immediately	Gavin Shan	1	-53/+43
	The PHB diag-data is important to help locating the root cause for EEH errors such as frozen PE or fenced PHB. However, the EEH core enables IO path by clearing part of HW registers before collecting this data causing it to be corrupted. This patch fixes this by dumping the PHB diag-data immediately when frozen/fenced state on PE or PHB is detected for the first time in eeh_ops::get_state() or next_error() backend. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28	powerpc: Increase stack redzone for 64-bit userspace to 512 bytes	Paul Mackerras	3	-5/+20
	The new ELFv2 little-endian ABI increases the stack redzone -- the area below the stack pointer that can be used for storing data -- from 288 bytes to 512 bytes. This means that we need to allow more space on the user stack when delivering a signal to a 64-bit process. To make the code a bit clearer, we define new USER_REDZONE_SIZE and KERNEL_REDZONE_SIZE symbols in ptrace.h. For now, we leave the kernel redzone size at 288 bytes, since increasing it to 512 bytes would increase the size of interrupt stack frames correspondingly. Gcc currently only makes use of 288 bytes of redzone even when compiling for the new little-endian ABI, and the kernel cannot currently be compiled with the new ABI anyway. In the future, hopefully gcc will provide an option to control the amount of redzone used, and then we could reduce it even more. This also changes the code in arch_compat_alloc_user_space() to preserve the expanded redzone. It is not clear why this function would ever be used on a 64-bit process, though. Signed-off-by: Paul Mackerras <paulus@samba.org> CC: <stable@vger.kernel.org> [v3.13] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28	powerpc/ftrace: bugfix for test_24bit_addr	Liu Ping Fan	1	-0/+1
	The branch target should be the func addr, not the addr of func_descr_t. So using ppc_function_entry() to generate the right target addr. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28	powerpc/crashdump : Fix page frame number check in copy_oldmem_page	Laurent Dufour	1	-3/+5
	In copy_oldmem_page, the current check using max_pfn and min_low_pfn to decide if the page is backed or not, is not valid when the memory layout is not continuous. This happens when running as a QEMU/KVM guest, where RTAS is mapped higher in the memory. In that case max_pfn points to the end of RTAS, and a hole between the end of the kdump kernel and RTAS is not backed by PTEs. As a consequence, the kdump kernel is crashing in copy_oldmem_page when accessing in a direct way the pages in that hole. This fix relies on the memblock's service memblock_is_region_memory to check if the read page is part or not of the directly accessible memory. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28	powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly	Tony Breeds	1	-11/+11
	Currently we're storing a host endian RTAS token in rtas_stop_self_args.token. We then pass that directly to rtas. This is fine on big endian however on little endian the token is not what we expect. This will typically result in hitting: panic("Alas, I survived.\n"); To fix this we always use the stop-self token in host order and always convert it to be32 before passing this to rtas. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-27	kvm, vmx: Really fix lazy FPU on nested guest	Paolo Bonzini	1	-1/+1
	Commit e504c9098ed6 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: e504c9098ed6acd9e1079c5e10e4910724ad429f Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27	kvm: x86: fix emulator buffer overflow (CVE-2014-0049)	Andrew Honig	1	-1/+1
	The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0f9230765c6315b2e14f56112513389ad Signed-off-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27	arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT	Marc Zyngier	3	-4/+37
	Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added support for CPU power-management, using a cpu_notifier to re-init KVM on a CPU that entered CPU idle. The code assumed that a CPU entering idle would actually be powered off, loosing its state entierely, and would then need to be reinitialized. It turns out that this is not always the case, and some HW performs CPU PM without actually killing the core. In this case, we try to reinitialize KVM while it is still live. It ends up badly, as reported by Andre Przywara (using a Calxeda Midway): [ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760 [ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150 [ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0 The trick here is to detect if we've been through a full re-init or not by looking at HVBAR (VBAR_EL2 on arm64). This involves implementing the backend for __hyp_get_vectors in the main KVM HYP code (rather small), and checking the return value against the default one when the CPU notifier is called on CPU_PM_EXIT. Reported-by: Andre Przywara <osp@andrep.de> Tested-by: Andre Przywara <osp@andrep.de> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Rob Herring <rob.herring@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27	perf/x86: Fix event scheduling	Peter Zijlstra	1	-0/+3
	Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-26	KVM: MMU: drop read-only large sptes when creating lower level sptes	Marcelo Tosatti	1	-0/+1
	Read-only large sptes can be created due to read-only faults as follows: - QEMU pagetable entry that maps guest memory is read-only due to COW. - Guest read faults such memory, COW is not broken, because it is a read-only fault. - Enable dirty logging, large spte not nuked because it is read-only. - Write-fault on such memory causes guest to loop endlessly (which must go down to level 1 because dirty logging is enabled). Fix by dropping large spte when necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-25	x86, kaslr: add missed "static" declarations	Kees Cook	1	-4/+5
	This silences build warnings about unexported variables and functions. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25	x86, kaslr: export offset in VMCOREINFO ELF notes	Eugene Surovegin	1	-0/+2
	Include kASLR offset in VMCOREINFO ELF notes to assist in debugging. [ hpa: pushing this for v3.14 to avoid having a kernel version with kASLR where we can't debug output. ] Signed-off-by: Eugene Surovegin <surovegin@google.com> Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	Linus Torvalds	5	-12/+8
	Pull m68k update from Geert Uytterhoeven: - More barrier.h consolidation - Sched_[gs]etattr() syscalls * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Wire up sched_setattr and sched_getattr m68k: Switch to asm-generic/barrier.h m68k: Sort arch/m68k/include/asm/Kbuild