aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2015-11-27s390: add 'install' target to 'make help'Michael Holzheu1-0/+4
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/sclp: Add VT220 support to early sclp consoleSascha Silbe2-16/+51
When running under qemu with the default configuration (-nographic), there is only a VT220 SCLP console, no line-mode SCLP console. Add VT220 support to the early SCLP console so the user has a chance to see critical error messages during early boot. None of the existing users of _sclp_print_early() check the return code. Instead of trying to come up with return code semantics when printing to multiple consoles (any or all of which may fail), we just drop the return code entirely. Tested on z/VM (line mode console) and LPAR (VT220 and line mode console). Tested on qemu/KVM with VT220 console and / or line mode console. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/dis: Fix printing of the register numbersChristian Borntraeger1-2/+2
Since commit b006f19b055f ("lib/vsprintf.c: handle invalid format specifiers more robustly") I get errors like [...] Krnl Code: 00000000004e2410: c00400000000 brcl 0,4e2410 Please remove unsupported %r in format string [ 8.179483] ------------[ cut here ]------------ [ 8.179484] WARNING: at lib/vsprintf.c:1781 Turns out that our disassembler relied on %r not being used as format string. Let's do the proper escaping of our decode buffers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/sclp_cpi: remove sclp_cpi module in favor of sysfs interfaceHendrik Brueckner3-54/+0
Since commit c05ffc4f2b20 ("[S390] sclp: sysfs interface for SCLP cpi"), which was made 2008 the user can specify a system and sysplex name through the /sys/firmware/cpi interface. In addition to sysplex and system name, the user can also override the system type and system version. Because the syfs interface is easier to use and allows the settings to be updated, the sclp_cpi module becomes obsolete and can be removed. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390: Delete unnecessary checks before the function call "debug_unregister"Markus Elfring4-16/+8
The debug_unregister() function performs also input parameter validation. Thus the test around the calls is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/pci_dma: fix DMA table corruption with > 4 TB main memoryGerald Schaefer3-7/+17
DMA addresses returned from map_page() are calculated by using an iommu bitmap plus a start_dma offset. The size of this bitmap is based on the main memory size. If we have more than (4 TB - start_dma) main memory, the DMA address calculation will also produce addresses > 4 TB. Such addresses cannot be inserted in the 3-level DMA page table, instead the entries modulo 4 TB will be overwritten. Fix this by restricting the iommu bitmap size to (4 TB - start_dma). Also set zdev->end_dma to the actual end address of the usable range, instead of the theoretical maximum as reported by the hardware, which fixes a sanity check in dma_map() and also the IOMMU API domain geometry aperture calculation. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390: get_user_pages_fast() might sleepDavid Hildenbrand1-0/+1
Let's annotate it correctly, so we directly get a warning if we ever were to use it in atomic/preempt_disable/spinlock environment. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/spinlock: avoid diagnose loopMartin Schwidefsky1-9/+19
The spinlock implementation calls the diagnose 0x9c / 0x44 immediately if the SIGP sense running reported the target CPU as not running. The diagnose 0x9c is a hint to the hypervisor to schedule the target CPU in preference to the source CPU that issued the diagnose. It can happen that on return from the diagnose the target CPU has not been scheduled yet, e.g. if the target logical CPU is on another physical CPU and the hypervisor did not want to migrate the logical CPU. Avoid the immediate repeat of the diagnose instruction, instead do the retry loop before the next invocation of diagnose 0x9c. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/dump: cleanup CPU save area handlingMartin Schwidefsky5-198/+145
Introduce save_area_alloc(), save_area_boot_cpu(), save_area_add_regs() and save_area_add_vxrs to deal with storing the CPU state in case of a system dump. Remove struct save_area and save_area_ext, and create a new struct save_area as a local definition to arch/s390/kernel/crash_dump.c. Copy each individual field from the hardware status area to the save area, storing the minimum of required data. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/dump: rework CPU register dump codeMartin Schwidefsky12-145/+158
To collect the CPU registers of the crashed system allocated a single page with memblock_alloc_base and use it as a copy buffer. Replace the stop-and-store-status sigp with a store-status-at-address sigp in smp_save_dump_cpus() and smp_store_status(). In both cases the target CPU is already stopped and store-status-at-address avoids the detour via the absolute zero page. For kexec simplify s390_reset_system and call store_status() before the prefix register of the boot CPU has been set to zero. Use STPX to store the prefix register and remove dump_prefix_page. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/dump: remove SAVE_AREA_BASEMartin Schwidefsky5-36/+42
Replace the SAVE_AREA_BASE offset calculations in reipl.S with the assembler constant for the location of each register status area. Use __LC_FPREGS_SAVE_AREA instead of SAVE_AREA_BASE in the three remaining code locations and remove the definition of SAVE_AREA_BASE. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/kvm: remove dependency on struct save_area definitionMartin Schwidefsky2-15/+16
Replace the offsets based on the struct area_area with the offset constants from asm-offsets.c based on the struct _lowcore. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/zcore: simplify memcpy_hsaMartin Schwidefsky1-68/+33
Replace the three part copy logic int memcpy_hsa with a single loop around sclp_sdias_copy with appropriate offset and size calculations, and inline memcpy_hsa into memcpy_hsa_user and memcpy_hsa_kernel. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/dump: streamline oldmem copy functionsMartin Schwidefsky6-102/+105
Introduce two copy functions for the memory of the dumped system, copy_oldmem_kernel() to copy to the virtual kernel address space and copy_oldmem_user() to copy to user space. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/kdump: remove code to create ELF notes in the crashed systemMartin Schwidefsky5-63/+28
The s390 architecture can store the CPU registers of the crashed system after the kdump kernel has been started and this is the preferred way. Remove the remaining code fragments that deal with storing CPU registers while the crashed system is still active. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/zcore: remove /sys/kernel/debug/zcore/memMartin Schwidefsky4-403/+18
New versions of the SCSI dumper use the /dev/vmcore interface instead of zcore mem. Remove the outdated interface. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/zcore: copy vector registers into the image dataMartin Schwidefsky2-16/+55
The /sys/kernel/debug/zcore/mem interface delivers the memory of the old system with the CPU registers stored to the assigned locations in each prefix page. For the vector registers the prefix page of each CPU has an address of a 1024 byte save area at 0x11b0. But the /sys/kernel/debug/zcore/mem interface fails copy the vector registers saved at boot of the zfcpdump kernel into the dump image. Copy the saved vector registers of a CPU to the outout buffer if the memory area that is read via /sys/kernel/debug/zcore/mem intersects with the vector register save area of this CPU. Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/zcore: remove invalid kfree in init_cpu_infoMartin Schwidefsky1-1/+0
The extended save area for the boot CPU has been allocated by smp_save_dump_cpus() with memblock_alloc() and may not be freed with kfree(). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27s390/zcrypt: Fix AP queue handling if queue is fullIngo Tuchscherer1-1/+3
When the AP queue depth of requests was reached additional requests have been ignored. These request are stuck in the request queue. The AP queue handling now push the next waiting request into the queue after fetching a previous serviced and finished reply. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-25Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-9/+10
Pull vfs fixes from Al Viro: "A couple of fixes for sendfile lockups caught by Dmitry + a fix for ancient sysvfs symlink breakage" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Avoid softlockups with sendfile(2) vfs: Make sendfile(2) killable even better fix sysvfs symlinks
2015-11-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds3-16/+24
Pull more block layer fixes from Jens Axboe: "I wasn't going to send off a new pull before next week, but the blk flush fix from Jan from the other day introduced a regression. It's rare enough not to have hit during testing, since it requires both a device that rejects the first flush, and bad timing while it does that. But since someone did hit it, let's get the revert into 4.4-rc3 so we don't have a released rc with that known issue. Apart from that revert, three other fixes: - From Christoph, a fix for a missing unmap in NVMe request preparation. - An NVMe fix from Nishanth that fixes data corruption on powerpc. - Also from Christoph, fix a list_del() attempt on blk-mq that didn't have a matching list_add() at timer start" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "blk-flush: Queue through IO scheduler when flush not required" block: fix blk_abort_request for blk-mq drivers nvme: add missing unmaps in nvme_queue_rq NVMe: default to 4k device page size
2015-11-25Revert "blk-flush: Queue through IO scheduler when flush not required"Jens Axboe1-1/+1
This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Jan writes: -- Thanks for report! After some investigation I found out we allocate elevator specific data in __get_request() only for non-flush requests. And this is actually required since the flush machinery uses the space in struct request for something else. Doh. So my patch is just wrong and not easy to fix since at the time __get_request() is called we are not sure whether the flush machinery will be used in the end. Jens, please revert 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks! I'm somewhat surprised that you can reliably hit the race where flushing gets disabled for the device just while the request is in flight. But I guess during boot it makes some sense. -- So let's just revert it, we can fix the queue run manually after the fact. This race is rare enough that it didn't trigger in testing, it requires the specific disable-while-in-flight scenario to trigger.
2015-11-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds21-111/+171
Pull KVM fixes from Paolo Bonzini: "Bug fixes for all architectures. Nothing really stands out" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: nVMX: remove incorrect vpid check in nested invvpid emulation arm64: kvm: report original PAR_EL1 upon panic arm64: kvm: avoid %p in __kvm_hyp_panic KVM: arm/arm64: vgic: Trust the LR state for HW IRQs KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active KVM: arm/arm64: Fix preemptible timer active state crazyness arm64: KVM: Add workaround for Cortex-A57 erratum 834220 arm64: KVM: Fix AArch32 to AArch64 register mapping ARM/arm64: KVM: test properly for a PTE's uncachedness KVM: s390: fix wrong lookup of VCPUs by array index KVM: s390: avoid memory overwrites on emergency signal injection KVM: Provide function for VCPU lookup by id KVM: s390: fix pfmf intercept handler KVM: s390: enable SIMD only when no VCPUs were created KVM: x86: request interrupt window when IRQ chip is split KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection KVM: x86: fix interrupt window handling in split IRQ chip case MIPS: KVM: Uninit VCPU in vcpu_create error path MIPS: KVM: Fix CACHE immediate offset sign extension ...
2015-11-25KVM: nVMX: remove incorrect vpid check in nested invvpid emulationHaozhong Zhang1-5/+0
This patch removes the vpid check when emulating nested invvpid instruction of type all-contexts invalidation. The existing code is incorrect because: (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate Translations Based on VPID", invvpid instruction does not check vpid in the invvpid descriptor when its type is all-contexts invalidation. (2) According to the same document, invvpid of type all-contexts invalidation does not require there is an active VMCS, so/and get_vmcs12() in the existing code may result in a NULL-pointer dereference. In practice, it can crash both KVM itself and L1 hypervisors that use invvpid (e.g. Xen). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-24block: fix blk_abort_request for blk-mq driversChristoph Hellwig1-3/+5
We only added the request to the request list for the !blk-mq case, so we should only delete it in that case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24nvme: add missing unmaps in nvme_queue_rqChristoph Hellwig1-3/+12
When we fail various metadata related operations in nvme_queue_rq we need to unmap the data SGL. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24NVMe: default to 4k device page sizeNishanth Aravamudan1-9/+6
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24Merge tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds4-29/+36
Pull device mapper fixes from Mike Snitzer: "Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential for infinite recursion on ioctl (with DM multipath). And four stable fixes: - A DM thin-provisioning fix to restore 'error_if_no_space' setting when a thin-pool is made writable again (after having been out of space). - A DM thin-provisioning fix to properly advertise discard support for thin volumes that are stacked on a thin-pool whose underlying data device doesn't support discards. - A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop when DM multipath is configured to 'queue_if_no_path'. - A DM crypt fix for a possible hang on dm-crypt device removal" * tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: fix regression in advertised discard limits dm crypt: fix a possible hang due to race condition on exit dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path dm: do not reuse dm_blk_ioctl block_device input as local variable dm: fix ioctl retry termination with signal dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
2015-11-24pidns: fix NULL dereference in __task_pid_nr_ns()Eric Dumazet1-2/+2
I got a crash during a "perf top" session that was caused by a race in __task_pid_nr_ns() : pid_nr_ns() was inlined, but apparently compiler chose to read task->pids[type].pid twice, and the pid->level dereference crashed because we got a NULL pointer at the second read : if (pid && ns->level <= pid->level) { // CRASH Just use RCU API properly to solve this race, and not worry about "perf top" crashing hosts :( get_task_pid() can benefit from same fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-24Merge tag 'kvm-arm-for-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-masterPaolo Bonzini345-5556/+4249
KVM/ARM Fixes for v4.4-rc3. Includes some timer fixes, properly unmapping PTEs, an errata fix, and two tweaks to the EL2 panic code.
2015-11-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds17-244/+525
Pull block layer fixes from Jens Axboe: "A round of fixes/updates for the current series. This looks a little bigger than it is, but that's mainly because we pushed the lightnvm enabled null_blk change out of the merge window so it could be updated a bit. The rest of the volume is also mostly lightnvm. In particular: - Lightnvm. Various fixes, additions, updates from Matias and Javier, as well as from Wenwei Tao. - NVMe: - Fix for potential arithmetic overflow from Keith. - Also from Keith, ensure that we reap pending completions from a completion queue before deleting it. Fixes kernel crashes when resetting a device with IO pending. - Various little lightnvm related tweaks from Matias. - Fixup flushes to go through the IO scheduler, for the cases where a flush is not required. Fixes a case in CFQ where we would be idling and not see this request, hence not break the idling. From Jan Kara. - Use list_{first,prev,next} in elevator.c for cleaner code. From Gelian Tang. - Fix for a warning trigger on btrfs and raid on single queue blk-mq devices, where we would flush plug callbacks with preemption disabled. From me. - A mac partition validation fix from Kees Cook. - Two merge fixes from Ming, marked stable. A third part is adding a new warning so we'll notice this quicker in the future, if we screw up the accounting. - Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes" * 'for-linus' of git://git.kernel.dk/linux-block: (32 commits) blk-merge: warn if figured out segment number is bigger than nr_phys_segments blk-merge: fix blk_bio_segment_split block: fix segment split blk-mq: fix calling unplug callbacks with preempt disabled mac: validate mac_partition is within sector mtip32xx: use formatting capability of kthread_create_on_node NVMe: reap completion entries when deleting queue lightnvm: add free and bad lun info to show luns lightnvm: keep track of block counts nvme: lightnvm: use admin queues for admin cmds lightnvm: missing free on init error lightnvm: wrong return value and redundant free null_blk: do not del gendisk with lightnvm null_blk: use device addressing mode null_blk: use ppa_cache pool NVMe: Fix possible arithmetic overflow for max segments blk-flush: Queue through IO scheduler when flush not required null_blk: register as a LightNVM device elevator: use list_{first,prev,next}_entry lightnvm: cleanup queue before target removal ...
2015-11-24arm64: kvm: report original PAR_EL1 upon panicMark Rutland1-1/+5
If we call __kvm_hyp_panic while a guest context is active, we call __restore_sysregs before acquiring the system register values for the panic, in the process throwing away the PAR_EL1 value at the point of the panic. This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to restoring host register values, enabling us to report the original values at the point of the panic. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: kvm: avoid %p in __kvm_hyp_panicMark Rutland1-1/+1
Currently __kvm_hyp_panic uses %p for values which are not pointers, such as the ESR value. This can confusingly lead to "(null)" being printed for the value. Use %x instead, and only use %p for host pointers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: vgic: Trust the LR state for HW IRQsChristoffer Dall1-14/+2
We were probing the physial distributor state for the active state of a HW virtual IRQ, because we had seen evidence that the LR state was not cleared when the guest deactivated a virtual interrupted. However, this issue turned out to be a software bug in the GIC, which was solved by: 84aab5e68c2a5e1e18d81ae8308c3ce25d501b29 (KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active, 2015-11-24) Therefore, get rid of the complexities and just look at the LR. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.activeChristoffer Dall3-24/+40
We were incorrectly removing the active state from the physical distributor on the timer interrupt when the timer output level was deasserted. We shouldn't be doing this without considering the virtual interrupt's active state, because the architecture requires that when an LR has the HW bit set and the pending or active bits set, then the physical interrupt must also have the corresponding bits set. This addresses an issue where we have been observing an inconsistency between the LR state and the physical distributor state where the LR state was active and the physical distributor was not active, which shouldn't happen. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: Fix preemptible timer active state crazynessChristoffer Dall1-6/+1
We were setting the physical active state on the GIC distributor in a preemptible section, which could cause us to set the active state on different physical CPU from the one we were actually going to run on, hacoc ensues. Since we are no longer descheduling/scheduling soft timers in the flush/sync timer functions, simply moving the timer flush into a non-preemptible section. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Add workaround for Cortex-A57 erratum 834220Marc Zyngier4-1/+38
Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults when a Stage 1 permission fault or device alignment fault should have been reported. This patch implements the workaround (which is to validate that the Stage-1 translation actually succeeds) by using code patching. Cc: stable@vger.kernel.org Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Fix AArch32 to AArch64 register mappingMarc Zyngier2-4/+6
When running a 32bit guest under a 64bit hypervisor, the ARMv8 architecture defines a mapping of the 32bit registers in the 64bit space. This includes banked registers that are being demultiplexed over the 64bit ones. On exceptions caused by an operation involving a 32bit register, the HW exposes the register number in the ESR_EL2 register. It was so far understood that SW had to distinguish between AArch32 and AArch64 accesses (based on the current AArch32 mode and register number). It turns out that I misinterpreted the ARM ARM, and the clue is in D1.20.1: "For some exceptions, the exception syndrome given in the ESR_ELx identifies one or more register numbers from the issued instruction that generated the exception. Where the exception is taken from an Exception level using AArch32 these register numbers give the AArch64 view of the register." Which means that the HW is already giving us the translated version, and that we shouldn't try to interpret it at all (for example, doing an MMIO operation from the IRQ mode using the LR register leads to very unexpected behaviours). The fix is thus not to perform a call to vcpu_reg32() at all from vcpu_reg(), and use whatever register number is supplied directly. The only case we need to find out about the mapping is when we actively generate a register access, which only occurs when injecting a fault in a guest. Cc: stable@vger.kernel.org Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24ARM/arm64: KVM: test properly for a PTE's uncachednessArd Biesheuvel1-8/+7
The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-23blk-merge: warn if figured out segment number is bigger than nr_phys_segmentsMing Lei1-0/+6
We had seen lots of reports of this kind issue, so add one warnning in blk-merge, then it can be triggered easily and avoid to depend on warning/bug from drivers. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23blk-merge: fix blk_bio_segment_splitMing Lei1-3/+19
Commit bdced438acd83a(block: setup bi_phys_segments after splitting) introduces function of computing bio->bi_phys_segments during bio splitting. Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size arn't computed, so too many physical segments may be obtained for one request since both the two are used to check if one segment across two bios can be possible. This patch fixes the issue by computing the two variables in blk_bio_segment_split(). Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting) Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23block: fix segment splitMing Lei1-2/+2
Inside blk_bio_segment_split(), previous bvec pointer(bvprvp) always points to the iterator local variable, which is obviously wrong, so fix it by pointing to the local variable of 'bvprv'. Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c) Cc: stable@kernel.org #4.3 Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23vfs: Avoid softlockups with sendfile(2)Jan Kara1-0/+1
The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23vfs: Make sendfile(2) killable even betterJan Kara1-0/+7
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where sendfile(2) was doing a lot of tiny writes into a filesystem and thus was unkillable for a long time. However sendfile(2) can be (mis)used to issue lots of writes into arbitrary file descriptor such as evenfd or similar special file descriptors which never hit the standard filesystem write path and thus are still unkillable. E.g. the following example from Dmitry burns CPU for ~16s on my test system without possibility to be killed: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); There are actually quite a few tests for pending signals in sendfile code however we data to write is always available none of them seems to trigger. So fix the problem by adding a test for pending signal into splice_from_pipe_next() also before the loop waiting for pipe buffers to be available. This should fix all the lockup issues with sendfile of the do-ton-of-tiny-writes nature. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23fix sysvfs symlinksAl Viro1-9/+2
The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Cc: stable@vger.kernel.org # all of them, not that anyone cared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23Merge tag 'linux-kselftest-4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftestLinus Torvalds2-5/+8
Pull kselftest fixes from Shuah Khan: "This update consists of one minor documentation fix and a fix to an existing test" * tag 'linux-kselftest-4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/seccomp: Get page size from sysconf tools:testing/selftests: fix typo in futex/README
2015-11-23dm thin: fix regression in advertised discard limitsMike Snitzer1-3/+2
When establishing a thin device's discard limits we cannot rely on the underlying thin-pool device's discard capabilities (which are inherited from the thin-pool's underlying data device) given that DM thin devices must provide discard support even when the thin-pool's underlying data device doesn't support discards. Users were exposed to this thin device discard limits regression if their thin-pool's underlying data device does _not_ support discards. This regression caused all upper-layers that called the blkdev_issue_discard() interface to not be able to issue discards to thin devices (because discard_granularity was 0). This regression wasn't caught earlier because the device-mapper-test-suite's extensive 'thin-provisioning' discard tests are only ever performed against thin-pool's with data devices that support discards. Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature rather than inferring whether or not a thin device's discard support should be enabled by looking at the thin-pool's discard_granularity. Fixes: 216076705 ("dm thin: disable discard support for thin devices if pool's is disabled") Reported-by: Mike Gerber <mike@sprachgewalt.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
2015-11-22Linux 4.4-rc2Linus Torvalds1-1/+1
2015-11-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-76/+182
Merge slub bulk allocator updates from Andrew Morton: "This missed the merge window because I was waiting for some repairs to come in. Nothing actually uses the bulk allocator yet and the changes to other code paths are pretty small. And the net guys are waiting for this so they can start merging the client code" More comments from Jesper Dangaard Brouer: "The kmem_cache_alloc_bulk() call, in mm/slub.c, were included in previous kernel. The present version contains a bug. Vladimir Davydov noticed it contained a bug, when kernel is compiled with CONFIG_MEMCG_KMEM (see commit 03ec0ed57ffc: "slub: fix kmem cgroup bug in kmem_cache_alloc_bulk"). Plus the mem cgroup counterpart in kmem_cache_free_bulk() were missing (see commit 033745189b1b "slub: add missing kmem cgroup support to kmem_cache_free_bulk"). I don't consider the fix stable-material because there are no in-tree users of the API. But with known bugs (for memcg) I cannot start using the API in the net-tree" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: slab/slub: adjust kmem_cache_alloc_bulk API slub: add missing kmem cgroup support to kmem_cache_free_bulk slub: fix kmem cgroup bug in kmem_cache_alloc_bulk slub: optimize bulk slowpath free by detached freelist slub: support for bulk free with SLUB freelists
2015-11-22Merge tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds11-13/+15
Pull tty/serial fixes from Greg KH: "Here are a few small tty/serial driver fixes for 4.4-rc2 that resolve some reported problems. All have been in linux-next, full details are in the shortlog below" * tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: export fsl8250_handle_irq serial: 8250_mid: Add missing dependency tty: audit: Fix audit source serial: etraxfs-uart: Fix crash serial: fsl_lpuart: Fix earlycon support bcm63xx_uart: Use the device name when registering an interrupt tty: Fix direct use of tty buffer work tty: Fix tty_send_xchar() lock order inversion