aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-09-26bpf: enable bpf syscall on x64 and i386Alexei Starovoitov2-0/+2
done as separate commit to ease conflict resolution Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-83/+31
2014-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-2/+2
Pull crypto fixes from Herbert Xu: "This fixes three issues: - if ccp is loaded on a machine without ccp, it will incorrectly activate causing all requests to fail. Fixed by preventing ccp from loading if hardware isn't available. - not all IRQs were enabled for the qat driver, leading to potential stalls when it is used - disabled buggy AVX CTR implementation in aesni" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - disable "by8" AVX CTR optimization crypto: ccp - Check for CCP before registering crypto algs crypto: qat - Enable all 32 IRQs
2014-09-24crypto: aesni - disable "by8" AVX CTR optimizationMathias Krause1-2/+2
The "by8" implementation introduced in commit 22cddcc7df8f ("crypto: aes - AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it handles counter block overflows differently. It only accounts the right most 32 bit as a counter -- not the whole block as all other implementations do. This makes it fail the cryptomgr test #4 that specifically tests this corner case. As we're quite late in the release cycle, just disable the "by8" variant for now. Reported-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller12-78/+116
Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22Revert "x86/efi: Fixup GOT in all boot code paths"Linus Torvalds2-81/+29
This reverts commit 9cb0e394234d244fe5a97e743ec9dd7ddff7e64b. It causes my Sony Vaio Pro 11 to immediately reboot at startup. Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-19Merge tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-23/+1
Pull PCI fixes from Bjorn Helgaas: "These fix: - Boot video device detection on dual-GPU Apple systems - Hotplug fiascos on VGA switcheroo with radeon & nouveau drivers - Boot hang on Freescale i.MX6 systems - Excessive "no hotplug settings from platform" warnings In particular: Enumeration - Don't default exclusively to first video device (Bruno Prémont) PCI device hotplug - Remove "no hotplug settings from platform" warning (Bjorn Helgaas) - Add pci_ignore_hotplug() for VGA switcheroo (Bjorn Helgaas) Freescale i.MX6 - Put LTSSM in "Detect" state before disabling (Lucas Stach)" * tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: vgaarb: Drop obsolete #ifndef vgaarb: Don't default exclusively to first video device with mem+io ACPIPHP / radeon / nouveau: Remove acpi_bus_no_hotplug() PCI: Remove "no hotplug settings from platform" warning PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device PCI: imx6: Put LTSSM in "Detect" state before disabling it MAINTAINERS: Add Lucas Stach as co-maintainer for i.MX6 PCI driver
2014-09-19Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+3
Pull perf fixes from Ingo Molnar: "Two kernel side fixes: a kprobes fix and a perf_remove_from_context() fix (which does not yet fix the migration bug which is WIP)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix a race condition in perf_remove_from_context() kprobes/x86: Free 'optinsn' cache when range check fails
2014-09-19Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-37/+98
Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI fixes, a build fix, a page table dumping (debug) fix and a clang build fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Fix fdt-related memory reservation x86/mm: Apply the section attribute to the variable, not its type x86/efi: Fixup GOT in all boot code paths x86/efi: Only load initrd above 4g on second try x86-64, ptdump: Mark espfix area only if existent x86, irq: Fix build error caused by 9eabc99a635a77cbf09
2014-09-16vgaarb: Don't default exclusively to first video device with mem+ioBruno Prémont1-23/+1
Commit 20cde694027e ("x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()") moved boot video device detection from efifb to x86 and ia64 pci/fixup.c. For dual-GPU Apple computers above change represents a regression as code in efifb did forcefully override vga_default_device while the merge did not (vgaarb happens prior to PCI fixup). To improve on initial device selection by vgaarb (it cannot know if PCI device not behind bridges see/decode legacy VGA I/O or not), move the screen_info based check from pci_video_fixup() to vgaarb's init function and use it to refine/override decision taken while adding the individual PCI VGA devices. This way PCI fixup has no reason to adjust vga_default_device anymore but can depend on its value for flagging shadowed VBIOS. This has the nice benefit of removing duplicated code but does introduce a #if defined() block in vgaarb. Not all architectures have screen_info and would cause compile to fail without it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=84461 Reported-and-Tested-By: Andreas Noever <andreas.noever@gmail.com> Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Matthew Garrett <matthew.garrett@nebula.com> CC: stable@vger.kernel.org # v3.5+
2014-09-13Make ARCH_HAS_FAST_MULTIPLIER a real config variableLinus Torvalds2-2/+1
It used to be an ad-hoc hack defined by the x86 version of <asm/bitops.h> that enabled a couple of library routines to know whether an integer multiply is faster than repeated shifts and additions. This just makes it use the real Kconfig system instead, and makes x86 (which was the only architecture that did this) select the option. NOTE! Even for x86, this really is kind of wrong. If we cared, we would probably not enable this for builds optimized for netburst (P4), where shifts-and-adds are generally faster than multiplies. This patch does *not* change that kind of logic, though, it is purely a syntactic change with no code changes. This was triggered by the fact that we have other places that really want to know "do I want to expand multiples by constants by hand or not", particularly the hash generation code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-11Merge tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds2-15/+13
Pull Xen bug fixes from David Vrabel: - fix for PVHVM suspend/resume and migration - don't pointlessly retry certain ballooning ops - fix gntalloc when grefs have run out. - fix PV boot if KSALR is enable or very large modules are used. * tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: don't copy bogus duplicate entries into kernel page tables xen/gntalloc: safely delete grefs in add_grefs() undo path xen/gntalloc: fix oops after runnning out of grant refs xen/balloon: cancel ballooning if adding new memory failed xen/manage: Always freeze/thaw processes when suspend/resuming
2014-09-10x86/xen: don't copy bogus duplicate entries into kernel page tablesStefan Bader2-15/+13
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2014-09-09net: bpf: be friendly to kmemcheckDaniel Borkmann1-1/+1
Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: bpf: consolidate JIT binary allocatorDaniel Borkmann1-39/+11
Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") and later on replicated in aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code") for s390 architecture, write protection for BPF JIT images got added and a random start address of the JIT code, so that it's not on a page boundary anymore. Since both use a very similar allocator for the BPF binary header, we can consolidate this code into the BPF core as it's mostly JIT independant anyway. This will also allow for future archs that support DEBUG_SET_MODULE_RONX to just reuse instead of reimplementing it. JIT tested on x86_64 and s390x with BPF test suite. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: filter: add "load 64-bit immediate" eBPF instructionAlexei Starovoitov1-0/+17
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register. All previous instructions were 8-byte. This is first 16-byte instruction. Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction: insn[0].code = BPF_LD | BPF_DW | BPF_IMM insn[0].dst_reg = destination register insn[0].imm = lower 32-bit insn[1].code = 0 insn[1].imm = upper 32-bit All unused fields must be zero. Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads 32-bit immediate value into a register. x64 JITs it as single 'movabsq %rax, imm64' arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn Note that old eBPF programs are binary compatible with new interpreter. It helps eBPF programs load 64-bit constant into a register with one instruction instead of using two registers and 4 instructions: BPF_MOV32_IMM(R1, imm32) BPF_ALU64_IMM(BPF_LSH, R1, 32) BPF_MOV32_IMM(R2, imm32) BPF_ALU64_REG(BPF_OR, R1, R2) User space generated programs will use this instruction to load constants only. To tell kernel that user space needs a pointer the _pseudo_ variant of this instruction may be added later, which will use extra bits of encoding to indicate what type of pointer user space is asking kernel to provide. For example 'off' or 'src_reg' fields can be used for such purpose. src_reg = 1 could mean that user space is asking kernel to validate and load in-kernel map pointer. src_reg = 2 could mean that user space needs readonly data section pointer src_reg = 3 could mean that user space needs a pointer to per-cpu local data All such future pseudo instructions will not be carrying the actual pointer as part of the instruction, but rather will be treated as a request to kernel to provide one. The kernel will verify the request_for_a_pointer, then will drop _pseudo_ marking and will store actual internal pointer inside the instruction, so the end result is the interpreter and JITs never see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that loads 64-bit immediate into a register. User space never operates on direct pointers and verifier can easily recognize request_for_pointer vs other instructions. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgentIngo Molnar3-36/+92
Pull EFI fixes from Matt Fleming: * Fix early boot regression affecting x86 EFI boot stub when loading initrds above 4GB - Yinghai Lu * Relocate GOT entries in the x86 EFI boot stub now that we have symbols with global visibility - Matt Fleming * fdt memory reservation fix for arm64 - Mark Salter Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-09x86/mm: Apply the section attribute to the variable, not its typeJan-Simon Möller1-1/+1
This fixes a compilation error in clang in that a linker section attribute can't be added to a type: arch/x86/mm/mmap.c:34:8: error: '__section__' attribute only applies to functions and global variables struct __read_mostly ... By moving the section attribute to the variable declaration, the desired effect is achieved. Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Behan Webster <behanw@converseincode.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1409959005-11479-1-git-send-email-behanw@converseincode.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-08x86/efi: Fixup GOT in all boot code pathsMatt Fleming2-29/+81
Maarten reported that his Macbook pro 8.2 stopped booting after commit f23cf8bd5c1f49 ("efi/x86: efistub: Move shared dependencies to <asm/efi.h>"), the main feature of which is changing the visibility of symbol 'efi_early' from local to global. By making 'efi_early' global we end up requiring an entry in the Global Offset Table. Unfortunately, while we do include code to fixup GOT entries in the early boot code, it's only called after we've executed the EFI boot stub. What this amounts to is that references to 'efi_early' in the EFI boot stub don't point to the correct place. Since we've got multiple boot entry points we need to be prepared to fixup the GOT in multiple places, while ensuring that we never do it more than once, otherwise the GOT entries will still point to the wrong place. Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-08x86/efi: Only load initrd above 4g on second tryYinghai Lu1-7/+11
Mantas found that after commit 4bf7111f5016 ("x86/efi: Support initrd loaded above 4G"), the kernel freezes at the earliest possible moment when trying to boot via UEFI on Asus laptop. Revert to old way to load initrd under 4G on first try, second try will use above 4G buffer when initrd is too big and does not fit under 4G. [ The cause of the freeze appears to be a firmware bug when reading file data into buffers above 4GB, though the exact reason is unknown. Mantas reports that the hang can be avoid if the file size is a multiple of 512 bytes, but I've seen some ASUS firmware simply corrupting the file data rather than freezing. Laszlo fixed an issue in the upstream EDK2 DiskIO code in Aug 2013 which may possibly be related, commit 4e39b75e ("MdeModulePkg/DiskIoDxe: fix source/destination pointer of overrun transfer"). Whatever the cause, it's unlikely that a fix will be forthcoming from the vendor, hence the workaround - Matt ] Cc: Laszlo Ersek <lersek@redhat.com> Reported-by: Mantas Mikulėnas <grawity@gmail.com> Reported-by: Harald Hoyer <harald@redhat.com> Tested-by: Anders Darander <anders@chargestorm.se> Tested-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-08x86-64, ptdump: Mark espfix area only if existentMathias Krause1-0/+4
We should classify the espfix area as such only if we actually have enabled the corresponding option. Otherwise the page table dump might look confusing. Signed-off-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/1410114629-24523-1-git-send-email-minipli@googlemail.com Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller16-29/+82
2014-09-05net: bpf: make eBPF interpreter images read-onlyDaniel Borkmann1-12/+6
With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01x86, irq: Fix build error caused by 9eabc99a635a77cbf09Jiang Liu1-0/+1
Commit 9eabc99a635a77cbf09 causes following build error when IOAPIC is disabled. arch/x86/pci/irq.c: In function 'pirq_disable_irq': >> arch/x86/pci/irq.c:1259:2: error: implicit declaration of function 'mp_should_keep_irq' [-Werror=implicit-function-declaration] if (io_apic_assign_pci_irqs && !mp_should_keep_irq(&dev->dev) && ^ cc1: some warnings being treated as errors Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1409382916-10649-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-29Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-4/+33
Pull x86 fixes from Peter Anvin: "One patch to avoid assigning interrupts we don't actually have on non-PC platforms, and two patches that addresses bugs in the new IOAPIC assignment code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, irq, PCI: Keep IRQ assignment for runtime power management x86: irq: Fix bug in setting IOAPIC pin attributes x86: Fix non-PC platform kernel crash on boot due to NULL dereference
2014-08-29kexec: purgatory: add clean-up for purgatory directoryMichael Welling1-0/+1
Without this patch the kexec-purgatory.c and purgatory.ro files are not removed after make mrproper. Signed-off-by: Michael Welling <mwelling@ieee.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatoryVivek Goyal1-0/+1
Thomas reported that build of x86_64 kernel was failing for him. He is using 32bit tool chain. Problem is that while compiling purgatory, I have not specified -m64 flag. And 32bit tool chain must be assuming -m32 by default. Following is error message. (mini) [~/work/linux-2.6] make scripts/kconfig/conf --silentoldconfig Kconfig CHK include/config/kernel.release UPD include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h CC arch/x86/purgatory/purgatory.o arch/x86/purgatory/purgatory.c:1:0: error: code model 'large' not supported in the 32 bit mode Fix it by explicitly passing appropriate -m64/-m32 build flag for purgatory. Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Tested-by: Thomas Glanzmann <thomas@glanzmann.de> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29kexec: create a new config option CONFIG_KEXEC_FILE for new syscallVivek Goyal7-20/+31
Currently new system call kexec_file_load() and all the associated code compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code which currently uses gcc option -mcmodel=large. This option seems to be available only gcc 4.4 onwards. Hiding new functionality behind a new config option will not break existing users of old gcc. Those who wish to enable new functionality will require new gcc. Having said that, I am trying to figure out how can I move away from using -mcmodel=large but that can take a while. I think there are other advantages of introducing this new config option. As this option will be enabled only on x86_64, other arches don't have to compile generic kexec code which will never be used. This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had to do this for CONFIG_KEXEC. Now with introduction of new config option, we can remove crypto dependency from other arches. Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had CONFIG_X86_64 defined, I got rid of that. For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to "depends on CRYPTO=y". This should be safer as "select" is not recursive. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29x86,mm: fix pte_special versus pte_numaHugh Dickins1-2/+7
Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels: where zap_pte_range() checks page->mapping to see if PageAnon(page). Those addresses fit struct pages for pfns d2001 and d2000, and in each dump a register or a stack slot showed d2001730 or d2000730: pte flags 0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has a hole between cfffffff and 100000000, which would need special access. Commit c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL pte no longer passes the pte_special() test, so zap_pte_range() goes on to try to access a non-existent struct page. Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE). A hint that this was a problem was that c46a7c817e66 added pte_numa() test to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast path: This was papering over a pte_special() snag when the zero page was encountered during zap. This patch reverts vm_normal_page() to how it was before, relying on pte_special(). It still appears that this patch may be incomplete: aren't there other places which need to be handling PROTNONE along with PRESENT? For example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE area, that would make it pte_special(). This is side-stepped by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be interesting. Fixes: c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: <stable@vger.kernel.org> [3.16] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29x86, irq, PCI: Keep IRQ assignment for runtime power managementJiang Liu4-2/+16
Now IOAPIC driver dynamically allocates IRQ numbers for IOAPIC pins. We need to keep IRQ assignment for PCI devices during runtime power management, otherwise it may cause failure of device wakeups. Commit 3eec595235c17a7 "x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation" has fixed the issue for suspend/ hibernation, we also need the same fix for runtime device sleep too. Fix: https://bugzilla.kernel.org/show_bug.cgi?id=83271 Reported-and-Tested-by: EmanueL Czirai <amanual@openmailbox.org> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: EmanueL Czirai <amanual@openmailbox.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1409304383-18806-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-27kprobes/x86: Free 'optinsn' cache when range check failsWang Nan1-1/+3
This patch frees the 'optinsn' slot when we get a range check error, to prevent memory leaks. Before this patch, cache entry in kprobe_insn_cache() won't be freed if kprobe optimizing fails due to range check failure. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Pei Feiyue <peifeiyue@huawei.com> Link: http://lkml.kernel.org/r/1406550019-70935-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-27x86: irq: Fix bug in setting IOAPIC pin attributesJiang Liu1-1/+14
Commit 15a3c7cc9154321fc3 "x86, irq: Introduce two helper functions to support irqdomain map operation" breaks LPSS ACPI enumerated devices. On startup, IOAPIC driver preallocates IRQ descriptors and programs IOAPIC pins with default level and polarity attributes for all legacy IRQs. Later legacy IRQ users may fail to set IOAPIC pin attributes if the requested attributes conflicts with the default IOAPIC pin attributes. So change mp_irqdomain_map() to allow the first legacy IRQ user to reprogram IOAPIC pin with different attributes. Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1409118795-17046-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-25bpf: x86: add missing 'shift by register' instructions to x64 eBPF JITAlexei Starovoitov1-0/+42
'shift by register' operations are supported by eBPF interpreter, but were accidently left out of x64 JIT compiler. Fix it and add a testcase. Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25x86: Fix non-PC platform kernel crash on boot due to NULL dereferenceAndy Shevchenko2-1/+3
Upstream commit: 95d76acc7518d5 ("x86, irq: Count legacy IRQs by legacy_pic->nr_legacy_irqs instead of NR_IRQS_LEGACY") removed reserved interrupts for the platforms that do not have a legacy IOAPIC. Which breaks the boot on Intel MID platforms such as Medfield: BUG: unable to handle kernel NULL pointer dereference at 0000003a IP: [<c107079a>] setup_irq+0xf/0x4d [ 0.000000] *pdpt = 0000000000000000 *pde = 9bbf32453167e510 The culprit is an uncoditional setting of IRQ2 which is used as cascade IRQ on legacy platforms. It seems we have to check if we have enough legacy IRQs reserved before we can call setup_irq(). The fix adds such check in native_init_IRQ() and in setup_default_timer_irq(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1405931920-12871-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-3/+9
Pull x86 fixes from Ingo Molnar: "A couple of EFI fixes, plus misc fixes all around the map" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Store Runtime Services revision firmware: Do not use WARN_ON(!spin_is_locked()) x86_32, entry: Clean up sysenter_badsys declaration x86/doc: Fix the 'tlb_single_page_flush_ceiling' sysconfig path x86/mm: Fix sparse 'tlb_single_page_flush_ceiling' warning and make the variable read-mostly x86/mm: Fix RCU splat from new TLB tracepoints
2014-08-22Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgentIngo Molnar94-1400/+3547
Pull EFI fixes from Matt Fleming: * WARN_ON(!spin_is_locked()) always triggers on non-SMP machines. Swap it for the more canonical lockdep_assert_held() which always does the right thing - Guenter Roeck * Assign the correct value to efi.runtime_version on arm64 so that all the runtime services can be invoked - Semen Protsenko Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-19Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"Paolo Bonzini1-1/+1
This reverts commit 682367c494869008eb89ef733f196e99415ae862, which causes 32-bit SMP Windows 7 guests to panic. SeaBIOS has a limit on the number of MTRRs that it can handle, and this patch exceeded the limit. Better revert it. Thanks to Nadav Amit for debugging the cause. Cc: stable@nongnu.org Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19KVM: x86: do not check CS.DPL against RPL during task switchPaolo Bonzini1-3/+0
This reverts the check added by commit 5045b468037d (KVM: x86: check CS.DPL against RPL during task switch, 2014-05-15). Although the CS.DPL=CS.RPL check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions" which cause #TSS exceptions. In fact it causes some tests to fail, which pass on bare-metal. Keep the rest of the commit, since we will find new uses for it in 3.18. Reported-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19KVM: x86: Avoid emulating instructions on #UD mistakenlyNadav Amit1-4/+4
Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Tweak operand order in &&, remove EmulateOnUD where it's now superfluous. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-16Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linuxLinus Torvalds1-0/+3
Pull idle update from Len Brown: "Two Intel-platform-specific updates to intel_idle, and a cosmetic tweak to the turbostat utility" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: tweak whitespace in output format intel_idle: Broadwell support intel_idle: Disable Baytrail Core and Module C6 auto-demotion
2014-08-15intel_idle: Disable Baytrail Core and Module C6 auto-demotionLen Brown1-0/+3
Power efficiency improves on Baytrail (Intel Atom Processor E3000) when Linux disables C6 auto-demotion. Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
2014-08-15x86_32, entry: Clean up sysenter_badsys declarationStefan Bader1-1/+1
commit 554086d85e "x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but somehow the END statements seem to have gone wrong (at least it feels that way to me). This does not seem to be a fatal problem, but just for the sake of symmetry, change the second syscall_badsys to sysenter_badsys. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.com Acked-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-08-14Merge tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds2-6/+6
Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas: "Part two of the PCI changes for v3.17: - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine) It's a mechanical change that removes uses of the DEFINE_PCI_DEVICE_TABLE macro. I waited until later in the merge window to reduce conflicts, but it's possible you'll still see a few" * tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
2014-08-14Merge tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds2-6/+6
Pull Xen bugfixes from David Vrabel: - fix ARM build - fix boot crash with PVH guests - improve reliability of resume/migration * tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: use vmap() to map grant table pages in PVH guests x86/xen: resume timer irqs early arm/xen: remove duplicate arch_gnttab_init() function
2014-08-13Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds32-1076/+912
Pull x86/apic updates from Thomas Gleixner: "This is a major overhaul to the x86 apic subsystem consisting of the following parts: - Remove obsolete APIC driver abstractions (David Rientjes) - Use the irqdomain facilities to dynamically allocate IRQs for IOAPICs. This is a prerequisite to enable IOAPIC hotplug support, and it also frees up wasted vectors (Jiang Liu) - Misc fixlets. Despite the hickup in Ingos previous pull request - caused by the missing fixup for the suspend/resume issue reported by Borislav - I strongly recommend that this update finds its way into 3.17. Some history for you: This is preparatory work for physical IOAPIC hotplug. The first attempt to support this was done by Yinghai and I shot it down because it just added another layer of obscurity and complexity to the already existing mess without tackling the underlying shortcomings of the current implementation. After quite some on- and offlist discussions, I requested that the design of this functionality must use generic infrastructure, i.e. irq domains, which provide all the mechanisms to dynamically map linux interrupt numbers to physical interrupts. Jiang picked up the idea and did a great job of consolidating the existing interfaces to manage the x86 (IOAPIC) interrupt system by utilizing irq domains. The testing in tip, Linux-next and inside of Intel on various machines did not unearth any oddities until Borislav exposed it to one of his oddball machines. The issue was resolved quickly, but unfortunately the fix fell through the cracks and did not hit the tip tree before Ingo sent the pull request. Not entirely Ingos fault, I also assumed that the fix was already merged when Ingo asked me whether he could send it. Nevertheless this work has a proper design, has undergone several rounds of review and the final fallout after applying it to tip and integrating it into Linux-next has been more than moderate. It's the ground work not only for IOAPIC hotplug, it will also allow us to move the lowlevel vector allocation into the irqdomain hierarchy, which will benefit other architectures as well. Patches are posted already, but they are on hold for two weeks, see below. I really appreciate the competence and responsiveness Jiang has shown in course of this endavour. So I'm sure that any fallout of this will be addressed in a timely manner. FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA will have a look at the hopefully zero fallout until I'm back" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation x86/apic/vsmp: Make is_vsmp_box() static x86, apic: Remove enable_apic_mode callback x86, apic: Remove setup_portio_remap callback x86, apic: Remove multi_timer_check callback x86, apic: Replace noop_check_apicid_used x86, apic: Remove check_apicid_present callback x86, apic: Remove mps_oem_check callback x86, apic: Remove smp_callin_clear_local_apic callback x86, apic: Replace trampoline physical addresses with defaults x86, apic: Remove x86_32_numa_cpu_node callback x86: intel-mid: Use the new io_apic interfaces x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box() x86, irq: Clean up irqdomain transition code x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled x86, irq, SFI: Release IOAPIC pin when PCI device is disabled x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled x86, irq: Introduce helper functions to release IOAPIC pin x86, irq: Simplify the way to handle ISA IRQ ...
2014-08-13Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+2
Pull x86/efix fixes from Peter Anvin: "Two EFI-related Kconfig changes, which happen to touch immediately adjacent lines in Kconfig and thus collapse to a single patch" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub x86/efi: Fix 3DNow optimization build failure in EFI stub
2014-08-13Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-70/+309
Pull x86/xsave changes from Peter Anvin: "This is a patchset to support the XSAVES instruction required to support context switch of supervisor-only features in upcoming silicon. This patchset missed the 3.16 merge window, which is why it is based on 3.15-rc7" * 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, xsave: Add forgotten inline annotation x86/xsaves: Clean up code in xstate offsets computation in xsave area x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi) Define kernel API to get address of each state in xsave area x86/xsaves: Enable xsaves/xrstors x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time x86/xsaves: Add xsaves and xrstors support for booting time x86/xsaves: Clear reserved bits in xsave header x86/xsaves: Use xsave/xrstor for saving and restoring user space context x86/xsaves: Use xsaves/xrstors for context switch x86/xsaves: Use xsaves/xrstors to save and restore xsave area x86/xsaves: Define a macro for handling xsave/xrstor instruction fault x86/xsaves: Define macros for xsave instructions x86/xsaves: Change compacted format xsave area header x86/alternative: Add alternative_input_2 to support alternative with two features and input x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-12PCI: Remove DEFINE_PCI_DEVICE_TABLE macro useBenoit Taine2-6/+6
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: Benoit Taine <benoit.taine@lip6.fr> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-08-11Merge tag 'efi-urgent' into x86/efiH. Peter Anvin3-12/+53
* Enforce CONFIG_RELOCATABLE for the x86 EFI boot stub, otherwise it's possible to overwrite random pieces of unallocated memory during kernel decompression, leading to machine resets. Resolved Conflicts: arch/x86/Kconfig Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-08-11x86/xen: use vmap() to map grant table pages in PVH guestsDavid Vrabel1-5/+5
Commit b7dd0e350e0b (x86/xen: safely map and unmap grant frames when in atomic context) causes PVH guests to crash in arch_gnttab_map_shared() when they attempted to map the pages for the grant table. This use of a PV-specific function during the PVH grant table setup is non-obvious and not needed. The standard vmap() function does the right thing. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com> Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Cc: stable@vger.kernel.org