linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-09-17	arm64: KVM: Remove all traces of the ThumbEE registers	Will Deacon	4	-29/+5
	Although the ThumbEE registers and traps were present in earlier versions of the v8 architecture, it was retrospectively removed and so we can do the same. Whilst this breaks migrating a guest started on a previous version of the kernel, it is much better to kill these (non existent) registers as soon as possible. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> [maz: added commend about migration] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-17	arm: KVM: Disable virtual timer even if the guest is not using it	Marc Zyngier	1	-2/+4
	When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-17	arm64: KVM: Disable virtual timer even if the guest is not using it	Marc Zyngier	1	-2/+3
	When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16	arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources	Pavel Fedin	1	-1/+1
	Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"), kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(), and vgic_v2_map_resources() still has it. But now vm_ops are not initialized until we call kvm_vgic_create(). Therefore kvm_vgic_map_resources() can being called without a VGIC, and we die because vm_ops.map_resources is NULL. Fixing this restores QEMU's kernel-irqchip=off option to a working state, allowing to use GIC emulation in userspace. Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops") Cc: stable@vger.kernel.org Signed-off-by: Pavel Fedin <p.fedin@samsung.com> [maz: reworked commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16	arm: KVM: Fix incorrect device to IPA mapping	Marek Majtyka	1	-2/+4
	A critical bug has been found in device memory stage1 translation for VMs with more then 4GB of address space. Once vm_pgoff size is smaller then pa (which is true for LPAE case, u32 and u64 respectively) some more significant bits of pa may be lost as a shift operation is performed on u32 and later cast onto u64. Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12 expected pa(u64): 0x0000002010030000 produced pa(u64): 0x0000000010030000 The fix is to change the order of operations (casting first onto phys_addr_t and then shifting). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed changelog and patch formatting] Cc: stable@vger.kernel.org Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16	arm64: KVM: Fix user access for debug registers	Marc Zyngier	1	-4/+4
	When setting the debug register from userspace, make sure that copy_from_user() is called with its parameters in the expected order. It otherwise doesn't do what you think. Fixes: 84e690bfbed1 ("KVM: arm64: introduce vcpu->arch.debug_ptr") Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-14	KVM: arm64: add workaround for Cortex-A57 erratum #852523	Will Deacon	1	-1/+3
	When restoring the system register state for an AArch32 guest at EL2, writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57, which can lead to the guest effectively running with junk in the DACR and running into unexpected domain faults. This patch works around the issue by re-ordering our restoration of the AArch32 register aliases so that they happen before the AArch64 system registers. Ensuring that the registers are restored in this order guarantees that they will be correctly synchronised by the core. Cc: <stable@vger.kernel.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04	arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores	Alexander Spyridakis	1	-4/+8
	If a guest requests the affinity info for a non-existing vCPU we need to properly return an error, instead of erroneously reporting an off state. Signed-off-by: Alexander Spyridakis <a.spyridakis@virtualopensystems.com> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04	arm64: KVM: set {v,}TCR_EL2 RES1 bits	Mark Rutland	1	-3/+7
	Currently we don't set the RES1 bits of TCR_EL2 and VTCR_EL2 when configuring them, which could lead to unexpected behaviour when an architectural meaning is defined for those bits. Set the RES1 bits to avoid issues. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04	arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0	Christoffer Dall	1	-0/+8
	Provide a better quality of implementation and be architecture compliant on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on reset of the timer. This change alone fixes the UEFI reset issue reported by Laszlo back in February. Cc: Laszlo Ersek <lersek@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Drew Jones <drjones@redhat.com> Cc: Wei Huang <wei@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04	arm/arm64: KVM: vgic: Move active state handling to flush_hwstate	Christoffer Dall	1	-16/+26
	We currently set the physical active state only when we inject a new pending virtual interrupt, but this is actually not correct, because we could have been preempted and run something else on the system that resets the active state to clear. This causes us to run the VM with the timer set to fire, but without setting the physical active state. The solution is to always check the LR configurations, and we if have a mapped interrupt in the LR in either the pending or active state (virtual), then set the physical active state. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-19	arm: KVM: keep arm vfp/simd exit handling consistent with arm64	Mario Smarduch	1	-6/+8
	After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved to guest trap handling. This allows us to keep exit handling flow between both architectures consistent. Signed-off-by: Mario Smarduch <m.smarduch@samsung.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-19	arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits	Mario Smarduch	2	-10/+68
	This patch only saves and restores FP/SIMD registers on Guest access. To do this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit. lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD context is not saved/restored [chazy/maz: fixed save/restore logic for 32bit guests] Signed-off-by: Mario Smarduch <m.smarduch@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: timer: Allow the timer to control the active state	Marc Zyngier	4	-15/+29
	In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Prevent userspace injection of a mapped interrupt	Marc Zyngier	2	-33/+72
	Virtual interrupts mapped to a HW interrupt should only be triggered from inside the kernel. Otherwise, you could end up confusing the kernel (and the GIC's) state machine. Rearrange the injection path so that kvm_vgic_inject_irq is used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is used for mapped interrupts. The latter should only be called from inside the kernel (timer, irqfd). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active	Marc Zyngier	2	-0/+26
	In order to control the active state of an interrupt, introduce a pair of accessors allowing the state to be set/queried. This only affects the logical state, and the HW state will only be applied at world-switch time. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest	Marc Zyngier	1	-3/+86
	To allow a HW interrupt to be injected into a guest, we lookup the guest virtual interrupt in the irq_phys_map list, and if we have a match, encode both interrupts in the LR. We also mark the interrupt as "active" at the host distributor level. On guest EOI on the virtual interrupt, the host interrupt will be deactivated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts	Marc Zyngier	3	-1/+235
	In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs	Marc Zyngier	1	-1/+1
	We only set the irq_queued flag for level interrupts, meaning that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate for all interrupts. This will allow us to inject edge HW interrupts, for which the state ACTIVE+PENDING is not allowed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR	Marc Zyngier	4	-5/+38
	Now that struct vgic_lr supports the LR_HW bit and carries a hwirq field, we can encode that information into the list registers. This patch provides implementations for both GICv2 and GICv3. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields	Marc Zyngier	1	-3/+7
	As we're about to cram more information in the vgic_lr structure (HW interrupt number and additional state information), we switch to a layout similar to the HW's: - use bitfields to save space (we don't need more than 10 bits to represent the irq numbers) - source CPU and HW interrupt can share the same field, as a SGI doesn't have a physical line. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	arm/arm64: KVM: Move vgic handling to a non-preemptible section	Marc Zyngier	1	-3/+15
	As we're about to introduce some serious GIC-poking to the vgic code, it is important to make sure that we're going to poke the part of the GIC that belongs to the CPU we're about to run on (otherwise, we'd end up with some unexpected interrupts firing)... Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run prevents the problem from occuring. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	arm/arm64: KVM: Fix ordering of timer/GIC on guest entry	Marc Zyngier	1	-4/+3
	As we now inject the timer interrupt when we're about to enter the guest, it makes a lot more sense to make sure this happens before the vgic code queues the pending interrupts. Otherwise, we get the interrupt on the following exit, which is not great for latency (and leads to all kind of bizarre issues when using with active interrupts at the HW level). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-08-12	arm64: KVM: remove remaining reference to vgic_sr_vectors	Vladimir Murzin	2	-7/+0
	Since commit 8a14849 (arm64: KVM: Switch vgic save/restore to alternative_insn) vgic_sr_vectors is not used anymore, so remove remaining leftovers and kill the structure. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12	arm64/kvm: Add generic v8 KVM target	Suzuki K. Poulose	3	-3/+12
	This patch adds a generic ARM v8 KVM target cpu type for use by the new CPUs which eventualy ends up using the common sys_reg table. For backward compatibility the existing targets have been preserved. Any new target CPU that can be covered by generic v8 sys_reg tables should make use of the new generic target. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <Marc.Zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: add trace points for guest_debug debug	Alex Bennée	4	-1/+179
	This includes trace points for: kvm_arch_setup_guest_debug kvm_arch_clear_guest_debug I've also added some generic register setting trace events and also a trace point to dump the array of hardware registers. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG	Alex Bennée	8	-19/+89
	Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm support is added this check can be moved to the common kvm_vm_ioctl_check_extension() code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: guest debug, HW assisted debug support	Alex Bennée	1	-0/+2
	This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy an IMPDEF registers into a new register set called host_debug_state. We use the recently introduced vcpu parameter debug_ptr to select which register set is copied into the real registers when world switch occurs. I've made some helper functions from hw_breakpoint.c more widely available for re-use. As with single step we need to tweak the guest registers to enable the exceptions so we need to save and restore those bits. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: introduce vcpu->arch.debug_ptr	Alex Bennée	9	-47/+316
	This introduces a level of indirection for the debug registers. Instead of using the sys_regs[] directly we store registers in a structure in the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the guest context. Because we no longer give the sys_regs offset for the sys_reg_desc->reg field, but instead the index into a debug-specific struct we need to add a number of additional trap functions for each register. Also as the generic generic user-space access code no longer works we have introduced a new pair of function pointers to the sys_reg_desc structure to override the generic code when needed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: re-factor hyp.S debug register code	Alex Bennée	1	-379/+138
	This is a pre-cursor to sharing the code with the guest debug support. This replaces the big macro that fishes data out of a fixed location with a more general helper macro to restore a set of debug registers. It uses macro substitution so it can be re-used for debug control and value registers. It does however rely on the debug registers being 64 bit aligned (as they happen to be in the hyp ABI). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: guest debug, add support for single-step	Alex Bennée	4	-5/+80
	This adds support for single-stepping the guest. To do this we need to manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits to trigger stepping. We take care to preserve MDSCR_EL1 and trap access to it to ensure we don't affect the apparent state of the guest. As we have to enable trapping of all software debug exceptions we suppress the ability of the guest to single-step itself. If we didn't we would have to deal with the exception arriving while the guest was in kernelspace when the guest is expecting to single-step userspace. This is something we don't want to unwind in the kernel. Once the host is no longer debugging the guest its ability to single-step userspace is restored. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: guest debug, add SW break point support	Alex Bennée	4	-2/+41
	This adds support for SW breakpoints inserted by userspace. We do this by trapping all guest software debug exceptions to the hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the exception syndrome information. It will be up to userspace to extract the PC (via GET_ONE_REG) and determine if the debug event was for a breakpoint it inserted. If not userspace will need to re-inject the correct exception restart the hypervisor to deliver the debug exception to the guest. Any other guest software debug exception (e.g. single step or HW assisted breakpoints) will cause an error and the VM to be killed. This is addressed by later patches which add support for the other debug types. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm: introduce kvm_arm_init/setup/clear_debug	Alex Bennée	8	-12/+108
	This is a precursor for later patches which will need to do more to setup debug state before entering the hyp.S switch code. The existing functionality for setting mdcr_el2 has been moved out of hyp.S and now uses the value kept in vcpu->arch.mdcr_el2. As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've had to add a mechanism to save the value of mdcr_el2 as a per-cpu variable during the initialisation code. The kernel never sets this number so we are assuming the bootcode has set up the correct value here. This also moves the conditional setting of the TDA bit from the hyp code into the C code which is currently used for the lazy debug register context switch code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl	Alex Bennée	4	-8/+34
	This commit adds a stub function to support the KVM_SET_GUEST_DEBUG ioctl. Any unsupported flag will return -EINVAL. For now, only KVM_GUESTDBG_ENABLE is supported, although it won't have any effects. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: arm64: guest debug, define API headers	Alex Bennée	1	-0/+27
	This commit defines the API headers for guest debugging. There are two architecture specific debug structures: - kvm_guest_debug_arch, allows us to pass in HW debug registers - kvm_debug_exit_arch, signals exception and possible faulting address The type of debugging being used is controlled by the architecture specific control bits of the kvm_guest_debug->control flags in the ioctl structure. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21	KVM: add comments for kvm_debug_exit_arch struct	Alex Bennée	2	-1/+6
	Bring into line with the comments for the other structures and their KVM_EXIT_* cases. Also update api.txt to reflect use in kvm_run documentation. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-12	Linux 4.2-rc2	Linus Torvalds	1	-1/+1

2015-07-12	Revert "drm/i915: Use crtc_state->active in primary check_plane func"	Linus Torvalds	1	-1/+1
	This reverts commit dec4f799d0a4c9edae20512fa60b0a36f3299ca2. Jörg Otte reports a NULL pointder dereference due to this commit, as 'crtc_state' very much can be NULL: crtc_state = state->base.state ? intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL; So the change to test 'crtc_state->base.active' cannot possibly be correct as-is. There may be some other minimal fix (like just checking crtc_state for NULL), but I'm just reverting it now for the rc2 release, and people like Daniel Vetter who actually know this code will figure out what the right solution is in the longer term. Reported-and-bisected-by: Jörg Otte <jrg.otte@gmail.com> Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-12	freeing unlinked file indefinitely delayed	Al Viro	1	-2/+5
	Normally opening a file, unlinking it and then closing will have the inode freed upon close() (provided that it's not otherwise busy and has no remaining links, of course). However, there's one case where that does not happen. Namely, if you open it by fhandle with cold dcache, then unlink() and close(). In normal case you get d_delete() in unlink(2) notice that dentry is busy and unhash it; on the final dput() it will be forcibly evicted from dcache, triggering iput() and inode removal. In this case, though, we end up with two dentries - disconnected (created by open-by-fhandle) and regular one (used by unlink()). The latter will have its reference to inode dropped just fine, but the former will not - it's considered hashed (it is on the ->s_anon list), so it will stay around until the memory pressure will finally do it in. As the result, we have the final iput() delayed indefinitely. It's trivial to reproduce - void flush_dcache(void) { system("mount -o remount,rw /"); } static char buf[20 * 1024 * 1024]; main() { int fd; union { struct file_handle f; char buf[MAX_HANDLE_SZ]; } x; int m; x.f.handle_bytes = sizeof(x); chdir("/root"); mkdir("foo", 0700); fd = open("foo/bar", O_CREAT \| O_RDWR, 0600); close(fd); name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0); flush_dcache(); fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR); unlink("foo/bar"); write(fd, buf, sizeof(buf)); system("df ."); /* 20Mb eaten / close(fd); system("df ."); / should've freed those 20Mb / flush_dcache(); system("df ."); / should be the same as #2 */ } will spit out something like Filesystem 1K-blocks Used Available Use% Mounted on /dev/root 322023 303843 1131 100% / Filesystem 1K-blocks Used Available Use% Mounted on /dev/root 322023 303843 1131 100% / Filesystem 1K-blocks Used Available Use% Mounted on /dev/root 322023 283282 21692 93% / - inode gets freed only when dentry is finally evicted (here we trigger than by remount; normally it would've happened in response to memory pressure hell knows when). Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/ Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-12	fix a braino in ovl_d_select_inode()	Al Viro	1	-0/+3
	when opening a directory we want the overlayfs inode, not one from the topmost layer. Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-12	9p: don't leave a half-initialized inode sitting around	Al Viro	2	-4/+2
	Cc: stable@vger.kernel.org # all branches Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-11	tick/broadcast: Prevent NULL pointer dereference	Thomas Gleixner	1	-8/+10
	Dan reported that the recent changes to the broadcast code introduced a potential NULL dereference. Add the proper check. Fixes: e0454311903d "tick/broadcast: Sanity check the shutdown of the local clock_event" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-10	selinux: fix mprotect PROT_EXEC regression caused by mm change	Stephen Smalley	1	-1/+2
	commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup skip security check and lockdep conflict with XFS") caused a regression for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on shared anonymous mappings. However, even before that regression, the checking on such mprotect PROT_EXEC calls was inconsistent with the checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a mmap, the security hook is passed a NULL file and knows it is dealing with an anonymous mapping and therefore applies an execmem check and no file checks. On a mprotect, the security hook is passed a vma with a non-NULL vm_file (as this was set from the internally-created shmem file during mmap) and therefore applies the file-based execute check and no execmem check. Since the aforementioned commit now marks the shmem zero inode with the S_PRIVATE flag, the file checks are disabled and we have no checking at all on mprotect PROT_EXEC. Add a test to the mprotect hook logic for such private inodes, and apply an execmem check in that case. This makes the mmap and mprotect checking consistent for shared anonymous mappings, as well as for /dev/zero and ashmem. Cc: <stable@vger.kernel.org> # 4.1.x Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-07-10	parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results	John David Anglin	5	-168/+212
	The increased use of pdtlb/pitlb instructions seemed to increase the frequency of random segmentation faults building packages. Further, we had a number of cases where TLB inserts would repeatedly fail and all forward progress would stop. The Haskell ghc package caused a lot of trouble in this area. The final indication of a race in pte handling was this syslog entry on sibaris (C8000): swap_free: Unused swap offset entry 00000004 BUG: Bad page map in process mysqld pte:00000100 pmd:019bbec5 addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464 CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1 Backtrace: [<0000000040173eb0>] show_stack+0x20/0x38 [<0000000040444424>] dump_stack+0x9c/0x110 [<00000000402a0d38>] print_bad_pte+0x1a8/0x278 [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770 [<00000000402a4090>] zap_page_range+0xf0/0x198 [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0 Note that the pte value is 0 except for the accessed bit 0x100. This bit shouldn't be set without the present bit. It should be noted that the madvise system call is probably a trigger for many of the random segmentation faults. In looking at the kernel code, I found the following problems: 1) The pte_clear define didn't take TLB lock when clearing a pte. 2) We didn't test pte present bit inside lock in exception support. 3) The pte and tlb locks needed to merged in order to ensure consistency between page table and TLB. This also has the effect of serializing TLB broadcasts on SMP systems. The attached change implements the above and a few other tweaks to try to improve performance. Based on the timing code, TLB purges are very slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it beneficial to test the split_tlb variable to avoid duplicate purges. Probably, all PA 2.0 machines have combined TLBs. I dropped using __flush_tlb_range in flush_tlb_mm as I realized all applications and most threads have a stack size that is too large to make this useful. I added some comments to this effect. Since implementing 1 through 3, I haven't had any random segmentation faults on mx3210 (rp3440) in about one week of building code and running as a Debian buildd. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Helge Deller <deller@gmx.de>
2015-07-10	stifb: Implement hardware accelerated copyarea	Alex Ivanov	1	-2/+38
	This patch adds hardware assisted scrolling. The code is based upon the following investigation: https://parisc.wiki.kernel.org/index.php/NGLE#Blitter A simple 'time ls -la /usr/bin' test shows 1.6x speed increase over soft copy and 2.3x increase over FBINFO_READS_FAST (prefer soft copy over screen redraw) on Artist framebuffer. Signed-off-by: Alex Ivanov <lausgans@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2015-07-10	nfit: add support for NVDIMM "latch" flag	Ross Zwisler	2	-1/+37
	Add support in the NFIT BLK I/O path for the "latch" flag defined in the "Get Block NVDIMM Flags" _DSM function: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf This flag requires the driver to read back the command register after it is written in the block I/O path. This ensures that the hardware has fully processed the new command and moved the aperture appropriately. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10	nfit: update block I/O path to use PMEM API	Ross Zwisler	2	-12/+100
	Update the nfit block I/O path to use the new PMEM API and to adhere to the read/write flows outlined in the "NVDIMM Block Window Driver Writer's Guide": http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf This includes adding support for targeted NVDIMM flushes called "flush hints" in the ACPI 6.0 specification: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf For performance and media durability the mapping for a BLK aperture is moved to a write-combining mapping which is consistent with memcpy_to_pmem() and wmb_blk(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10	tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test	Dan Williams	3	-2/+71
	In preparation for fixing the BLK path to properly use "directed pcommit" enable the unit test infrastructure to emit mock "flush" tables. Writes to these flush addresses trigger a memory controller to flush its internal buffers to persistent media, similar to the x86 "pcommit" instruction. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10	tools/testing/nvdimm: fix return code for unimplemented commands	Dan Williams	1	-1/+1
	The implementation for the new "DIMM Flags" DSM relies on the -ENOTTY return code to indicate that the flags are unimplimented and to fall back to a safe default. As is the -ENXIO error code erroneoously indicates to fail enabling a BLK region. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10	tools/testing/nvdimm: mock ioremap_wt	Dan Williams	2	-0/+7
	In the 4.2-rc1 merge the default_memremap_pmem() implementation switched from ioremap_nocache() to ioremap_wt(). Add it to the list of mocked routines to restore the ability to run the unit tests. Signed-off-by: Dan Williams <dan.j.williams@intel.com>