wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2021-04-01	tracing: Fix stack trace event size	Steven Rostedt (VMware)	1	-1/+2
	Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored. Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event: <idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000 Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved. Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/ Cc: stable@vger.kernel.org Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly") Reported-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-01	idr test suite: Improve reporting from idr_find_test_1	Matthew Wilcox (Oracle)	1	-1/+10
	Instead of just reporting an assertion failure, report enough information that we can start diagnosing exactly went wrong. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01	idr test suite: Create anchor before launching throbber	Matthew Wilcox (Oracle)	1	-2/+2
	The throbber could race with creation of the anchor entry and cause the IDR to have zero entries in it, which would cause the test to fail. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01	idr test suite: Take RCU read lock in idr_find_test_1	Matthew Wilcox (Oracle)	1	-0/+4
	When run on a single CPU, this test would frequently access already-freed memory. Due to timing, this bug never showed up on multi-CPU tests. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01	radix tree test suite: Register the main thread with the RCU library	Matthew Wilcox (Oracle)	3	-0/+6
	Several test runners register individual worker threads with the RCU library, but neglect to register the main thread, which can lead to objects being freed while the main thread is in what appears to be an RCU critical section. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01	ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()	Vitaly Kuznetsov	3	-1/+9
	Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g. I'm observing the following on AWS Nitro (e.g r5b.xlarge but other instance types are affected as well): # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online <10 seconds delay> -bash: echo: write error: Input/output error In fact, the above mentioned commit only revealed the problem and did not introduce it. On x86, to wakeup CPU an NMI is being used and hlt_play_dead()/mwait_play_dead() loops are prepared to handle it: /* * If NMI wants to wake up CPU0, start CPU0. */ if (wakeup_cpu0()) start_cpu0(); cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on systems where it wasn't called before the above mentioned commit) serves the same purpose but it doesn't have a path for CPU0. What happens now on wakeup is: - NMI is sent to CPU0 - wakeup_cpu0_nmi() works as expected - we get back to while (1) loop in acpi_idle_play_dead() - safe_halt() puts CPU0 to sleep again. The straightforward/minimal fix is add the special handling for CPU0 on x86 and that's what the patch is doing. Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-01	selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)	Vitaly Kuznetsov	1	-2/+11
	Add a test for the issue when KVM_SET_CLOCK(0) call could cause TSC page value to go very big because of a signedness issue around hv_clock->system_time. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210326155551.17446-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01	KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()	Vitaly Kuznetsov	1	-2/+17
	When guest time is reset with KVM_SET_CLOCK(0), it is possible for 'hv_clock->system_time' to become a small negative number. This happens because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, kvm_guest_time_update() does (masterclock in use case): hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; And 'master_kernel_ns' represents the last time when masterclock got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a problem, the difference is very small, e.g. I'm observing hv_clock.system_time = -70 ns. The issue comes from the fact that 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in compute_tsc_page_parameters() becomes a very big number. Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from going negative. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210331124130.337992-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01	KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken	Paolo Bonzini	1	-11/+14
	pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01	KVM: x86: reduce pvclock_gtod_sync_lock critical sections	Paolo Bonzini	1	-6/+4
	There is no need to include changes to vcpu->requests into the pvclock_gtod_sync_lock critical section. The changes to the shared data structures (in pvclock_update_vm_gtod_copy) already occur under the lock. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01	KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit	Paolo Bonzini	1	-1/+17
	Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01	KVM: SVM: load control fields from VMCB12 before checking them	Paolo Bonzini	1	-4/+6
	Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-31	drm/amdgpu: check alignment on CPU page for bo map	Xℹ Ruoyao	1	-4/+4
	The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31	drm/amdgpu: Set a suitable dev_info.gart_page_size	Huacai Chen	1	-2/+2
	In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE). Signed-off-by: Rui Wang <wangr@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <dan@danny.cz> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31	drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend	Alex Deucher	1	-0/+5
	Do the same thing we do for Renoir. We can check, but since the sbios has started DPM, it will always return true which causes the driver to skip some of the SMU init when it shouldn't. Reviewed-by: Zhan Liu <zhan.liu@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31	drm/amdkfd: dqm fence memory corruption	Qu Huang	7	-12/+12
	Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-30	reiserfs: update reiserfs_xattrs_initialized() condition	Tetsuo Handa	1	-1/+1
	syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible. I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root. Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1] Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-30	ACPI: scan: Fix _STA getting called on devices with unmet dependencies	Hans de Goede	1	-1/+11
	Commit 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") dropped the following 2 lines from acpi_init_device_object(): /* Assume there are unmet deps until acpi_device_dep_initialize() runs */ device->dep_unmet = 1; Leaving the initial value of dep_unmet at the 0 from the kzalloc(). This causes the acpi_bus_get_status() call in acpi_add_single_object() to actually call _STA, even though there maybe unmet deps, leading to errors like these: [ 0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c) [GenericSerialBus] (20170831/evregion-166) [ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20170831/exfldio-299) [ 0.123618] ACPI Error: Method parse/execution failed \_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550) Fix this by re-adding the dep_unmet = 1 initialization to acpi_init_device_object() and modifying acpi_bus_check_add() to make sure that dep_unmet always gets setup there, overriding the initial 1 value. This re-fixes the issue initially fixed by commit 63347db0affa ("ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs"), which introduced the removed "device->dep_unmet = 1;" statement. This issue was noticed; and the fix tested on a Dell Venue 10 Pro 5055. Fixes: 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: 5.11+ <stable@vger.kernel.org> # 5.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-30	drm/tegra: sor: Grab runtime PM reference across reset	Thierry Reding	1	-0/+7
	The SOR resets are exclusively shared with the SOR power domain. This means that exclusive access can only be granted temporarily and in order for that to work, a rigorous sequence must be observed. To ensure that a single consumer gets exclusive access to a reset, each consumer must implement a rigorous protocol using the reset_control_acquire() and reset_control_release() functions. However, these functions alone don't provide any guarantees at the system level. Drivers need to ensure that the only a single consumer has access to the reset at the same time. In order for the SOR to be able to exclusively access its reset, it must therefore ensure that the SOR power domain is not powered off by holding on to a runtime PM reference to that power domain across the reset assert/deassert operation. This used to work fine by accident, but was revealed when recently more devices started to rely on the SOR power domain. Fixes: 11c632e1cfd3 ("drm/tegra: sor: Implement acquire/release for reset") Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30	radix tree test suite: Fix compilation	Matthew Wilcox (Oracle)	1	-0/+0
	Commit 4bba4c4bb09a added tools/include/linux/compiler_types.h which includes linux/compiler-gcc.h. Unfortunately, we had our own (empty) compiler_types.h which overrode the one added by that commit, and so we lost the definition of __must_be_array(). Removing our empty compiler_types.h fixes the problem and reduces our divergence from the rest of the tools. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-03-30	XArray: Add xa_limit_16b	Matthew Wilcox (Oracle)	1	-1/+3
	A 16-bit limit is a more common limit than I had realised. Make it generally available. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-03-30	XArray: Fix splitting to non-zero orders	Matthew Wilcox (Oracle)	2	-14/+16
	Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-character fix, but enhance the test suite to check this case. Reported-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-03-30	XArray: Fix split documentation	Matthew Wilcox (Oracle)	1	-3/+4
	I wrote the documentation backwards; the new order of the entry is stored in the xas and the caller passes the old entry. Reported-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-03-30	drm/tegra: dc: Restore coupling of display controllers	Thierry Reding	1	-12/+8
	Coupling of display controllers used to rely on runtime PM to take the companion controller out of reset. Commit fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") accidentally broke this when runtime PM was removed. Restore this functionality by reusing the hierarchical host1x client suspend/resume infrastructure that's similar to runtime PM and which perfectly fits this use-case. Fixes: fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") Reported-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Paul Fertser <fercerpav@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30	gpu: host1x: Use different lock classes for each client	Mikko Perttunen	2	-5/+14
	To avoid false lockdep warnings, give each client lock a different lock class, passed from the initialization site by macro. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30	drm/tegra: dc: Don't set PLL clock to 0Hz	Dmitry Osipenko	1	-5/+5
	RGB output doesn't allow to change parent clock rate of the display and PCLK rate is set to 0Hz in this case. The tegra_dc_commit_state() shall not set the display clock to 0Hz since this change propagates to the parent clock. The DISP clock is defined as a NODIV clock by the tegra-clk driver and all NODIV clocks use the CLK_SET_RATE_PARENT flag. This bug stayed unnoticed because by default PLLP is used as the parent clock for the display controller and PLLP silently skips the erroneous 0Hz rate changes because it always has active child clocks that don't permit rate changes. The PLLP isn't acceptable for some devices that we want to upstream (like Samsung Galaxy Tab and ASUS TF700T) due to a display panel clock rate requirements that can't be fulfilled by using PLLP and then the bug pops up in this case since parent clock is set to 0Hz, killing the display output. Don't touch DC clock if pclk=0 in order to fix the problem. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30	KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages	Sean Christopherson	3	-6/+27
	Prevent the TDP MMU from yielding when zapping a gfn range during NX page recovery. If a flush is pending from a previous invocation of the zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU has not accumulated a flush for the current invocation, then yielding will release mmu_lock with stale TLB entries. That being said, this isn't technically a bug fix in the current code, as the TDP MMU will never yield in this case. tdp_mmu_iter_cond_resched() will yield if and only if it has made forward progress, as defined by the current gfn vs. the last yielded (or starting) gfn. Because zapping a single shadow page is guaranteed to (a) find that page and (b) step sideways at the level of the shadow page, the TDP iter will break its loop before getting a chance to yield. But that is all very, very subtle, and will break at the slightest sneeze, e.g. zapping while holding mmu_lock for read would break as the TDP MMU wouldn't be guaranteed to see the present shadow page, and thus could step sideways at a lower level. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-4-seanjc@google.com> [Add lockdep assertion. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping	Sean Christopherson	1	-4/+7
	Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which does the flush itself if and only if it yields (which it will never do in this particular scenario), and otherwise expects the caller to do the flush. If pages are zapped from the TDP MMU but not the legacy MMU, then no flush will occur. Fixes: 29cf0f5007a2 ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-3-seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap	Sean Christopherson	1	-11/+12
	When flushing a range of GFNs across multiple roots, ensure any pending flush from a previous root is honored before yielding while walking the tables of the current root. Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local "flush" with the result to avoid redundant flushes. zap_gfn_range() preserves and return the incoming "flush", unless of course the flush was performed prior to yielding and no new flush was triggered. Fixes: 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: make: Fix out-of-source module builds	Siddharth Chandrasekaran	1	-1/+1
	Building kvm module out-of-source with, make -C $SRC O=$BIN M=arch/x86/kvm fails to find "irq.h" as the include dir passed to cflags-y does not prefix the source dir. Fix this by prefixing $(srctree) to the include dir path. Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210324124347.18336-1-sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	selftests: kvm: make hardware_disable_test less verbose	Vitaly Kuznetsov	1	-5/+5
	hardware_disable_test produces 512 snippets like ... main: [511] waiting semaphore run_test: [511] start vcpus run_test: [511] all threads launched main: [511] waiting 368us main: [511] killing child and this doesn't have much value, let's print this info with pr_debug(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323104331.1354800-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE	Vitaly Kuznetsov	1	-0/+8
	MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however, allows these MSRs unconditionally because kvm_pmu_is_valid_msr() -> amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() -> amd_pmu_set_msr() doesn't fail. In case of a counter (CTRn), no big harm is done as we only increase internal PMC's value but in case of an eventsel (CTLn), we go deep into perf internals with a non-existing counter. Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist and this also seems to contradict architectural behavior which is #GP (I did check one old Opteron host) but changing this status quo is a bit scarier. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: x86: remove unused declaration of kvm_write_tsc()	Dongli Zhang	1	-1/+0
	kvm_write_tsc() was renamed and made static since commit 0c899c25d754 ("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove its unused declaration. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210326070334.12310-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	KVM: clean up the unused argument	Haiwei Li	1	-5/+4
	kvm_msr_ignored_check function never uses vcpu argument. Clean up the function and invokers. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210313051032.4171-1-lihaiwei.kernel@gmail.com> Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	tools/kvm_stat: Add restart delay	Stefan Raspl	1	-0/+1
	If this service is enabled and the system rebooted, Systemd's initial attempt to start this unit file may fail in case the kvm module is not loaded. Since we did not specify a delay for the retries, Systemd restarts with a minimum delay a number of times before giving up and disabling the service. Which means a subsequent kvm module load will have kvm running without monitoring. Adding a delay to fix this. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20210325122949.1433271-1-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30	mm: fix race by making init_zero_pfn() early_initcall	Ilya Lipnitskiy	1	-1/+1
	There are code paths that rely on zero_pfn to be fully initialized before core_initcall. For example, wq_sysfs_init() is a core_initcall function that eventually results in a call to kernel_execve, which causes a page fault with a subsequent mmput. If zero_pfn is not initialized by then it may not get cleaned up properly and result in an error: BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1 Here is an analysis of the race as seen on a MIPS device. On this particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until initialized, at which point it becomes PFN 5120: 1. wq_sysfs_init calls into kobject_uevent_env at core_initcall: kobject_uevent_env+0x7e4/0x7ec kset_register+0x68/0x88 bus_register+0xdc/0x34c subsys_virtual_register+0x34/0x78 wq_sysfs_init+0x1c/0x4c do_one_initcall+0x50/0x1a8 kernel_init_freeable+0x230/0x2c8 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c 2. kobject_uevent_env() calls call_usermodehelper_exec() which executes kernel_execve asynchronously. 3. Memory allocations in kernel_execve cause a page fault, bumping the MM reference counter: add_mm_counter_fast+0xb4/0xc0 handle_mm_fault+0x6e4/0xea0 __get_user_pages.part.78+0x190/0x37c __get_user_pages_remote+0x128/0x360 get_arg_page+0x34/0xa0 copy_string_kernel+0x194/0x2a4 kernel_execve+0x11c/0x298 call_usermodehelper_exec_async+0x114/0x194 4. In case zero_pfn has not been initialized yet, zap_pte_range does not decrement the MM_ANONPAGES RSS counter and the BUG message is triggered shortly afterwards when __mmdrop checks the ref counters: __mmdrop+0x98/0x1d0 free_bprm+0x44/0x118 kernel_execve+0x160/0x1d8 call_usermodehelper_exec_async+0x114/0x194 ret_from_kernel_thread+0x14/0x1c To avoid races such as described above, initialize init_zero_pfn at early_initcall level. Depending on the architecture, ZERO_PAGE is either constant or gets initialized even earlier, at paging_init, so there is no issue with initializing zero_pfn earlier. Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-30	ftrace: Check if pages were allocated before calling free_pages()	Steven Rostedt (VMware)	1	-3/+6
	It is possible that on error pg->size can be zero when getting its order, which would return a -1 value. It is dangerous to pass in an order of -1 to free_pages(). Check if order is greater than or equal to zero before calling free_pages(). Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/ Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-30	MIPS: kernel: setup.c: fix compilation error	Mauri Sandberg	1	-1/+1
	With ath79_defconfig enabling CONFIG_MIPS_ELF_APPENDED_DTB gives a compilation error. This patch fixes it. Build log: ... CC kernel/locking/percpu-rwsem.o ../arch/mips/kernel/setup.c:46:39: error: conflicting types for '__appended_dtb' const char __section(".appended_dtb") __appended_dtb[0x100000]; ^~~~~~~~~~~~~~ In file included from ../arch/mips/kernel/setup.c:34: ../arch/mips/include/asm/bootinfo.h:118:13: note: previous declaration of '__appended_dtb' was here extern char __appended_dtb[]; ^~~~~~~~~~~~~~ CC fs/attr.o make[4]: *** [../scripts/Makefile.build:271: arch/mips/kernel/setup.o] Error 1 ... Root cause seems to be: Fixes: b83ba0b9df56 ("MIPS: of: Introduce helper function to get DTB") Signed-off-by: Mauri Sandberg <sandberg@mailfence.com> Reviewed-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: trivial@kernel.org Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-30	ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8	Jeremy Szu	1	-0/+1
	The HP EliteBook 640 G8 Notebook PC is using ALC236 codec which is using 0x02 to control mute LED and 0x01 to control micmute LED. Therefore, add a quirk to make it works. Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210330114428.40490-1-jeremy.szu@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-30	ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks	Takashi Iwai	1	-0/+6
	The recently added PM prepare and complete callbacks don't have the sanity check whether the card instance has been properly initialized, which may potentially lead to Oops. This patch adds the azx_is_pm_ready() call in each place appropriately like other PM callbacks. Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210329113059.25035-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-30	ALSA: hda: Re-add dropped snd_poewr_change_state() calls	Takashi Iwai	1	-0/+2
	The card power state change via snd_power_change_state() at the system suspend/resume seems dropped mistakenly during the PM code rewrite. The card power state doesn't play much role nowadays but it's still referred in a few places such as the HDMI codec driver. This patch restores them, but in a more appropriate place now in the prepare and complete callbacks. Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210329113059.25035-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-29	vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends	Jason Gunthorpe	1	-1/+1
	Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there are compile errors: drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration] ret = mm_iommu_put(data->mm, data->mem); As PPC only defines these functions when the config is set. Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only IOMMU that could have satisfied IOMMU_API on POWERNV. Fixes: 179209fa1270 ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-29	xtensa: fix uaccess-related livelock in do_page_fault	Max Filippov	1	-1/+4
	If a uaccess (e.g. get_user()) triggers a fault and there's a fault signal pending, the handler will return to the uaccess without having performed a uaccess fault fixup, and so the CPU will immediately execute the uaccess instruction again, whereupon it will livelock bouncing between that instruction and the fault handler. https://lore.kernel.org/lkml/20210121123140.GD48431@C02TD0UTHF1T.local/ Cc: stable@vger.kernel.org Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2021-03-29	drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()	Nirmoy Das	1	-1/+1
	Offset calculation wasn't correct as start addresses are in pfn not in bytes. CC: stable@vger.kernel.org Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-29	drm/amd/pm: no need to force MCLK to highest when no display connected	Evan Quan	1	-1/+2
	Correct the check for vblank short. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-29	PM: runtime: Fix race getting/putting suppliers at probe	Adrian Hunter	1	-1/+7
	pm_runtime_put_suppliers() must not decrement rpm_active unless the consumer is suspended. That is because, otherwise, it could suspend suppliers for an active consumer. That can happen as follows: static int driver_probe_device(struct device_driver drv, struct device dev) { int ret = 0; if (!device_is_registered(dev)) return -ENODEV; dev->can_match = true; pr_debug("bus: '%s': %s: matched device %s with driver %s\n", drv->bus->name, __func__, dev_name(dev), drv->name); pm_runtime_get_suppliers(dev); if (dev->parent) pm_runtime_get_sync(dev->parent); At this point, dev can runtime suspend so rpm_put_suppliers() can run, rpm_active becomes 1 (the lowest value). pm_runtime_barrier(dev); if (initcall_debug) ret = really_probe_debug(dev, drv); else ret = really_probe(dev, drv); Probe callback can have runtime resumed dev, and then runtime put so dev is awaiting autosuspend, but rpm_active is 2. pm_request_idle(dev); if (dev->parent) pm_runtime_put(dev->parent); pm_runtime_put_suppliers(dev); Now pm_runtime_put_suppliers() will put the supplier i.e. rpm_active 2 -> 1, but consumer can still be active. return ret; } Fix by checking the runtime status. For any status other than RPM_SUSPENDED, rpm_active can be considered to be "owned" by rpm_[get/put]_suppliers() and pm_runtime_put_suppliers() need do nothing. Reported-by: Asutosh Das <asutoshd@codeaurora.org> Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-29	PM: runtime: Fix ordering in pm_runtime_get_suppliers()	Adrian Hunter	1	-1/+1
	rpm_active indicates how many times the supplier usage_count has been incremented. Consequently it must be updated after pm_runtime_get_sync() of the supplier, not before. Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-29	ACPI: tables: x86: Reserve memory occupied by ACPI tables	Rafael J. Wysocki	4	-22/+62
	The following problem has been reported by George Kennedy: Since commit 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()") the following use after free occurs intermittently when ACPI tables are accessed. BUG: KASAN: use-after-free in ibft_init+0x134/0xc49 Read of size 4 at addr ffff8880be453004 by task swapper/0/1 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1 Call Trace: dump_stack+0xf6/0x158 print_address_description.constprop.9+0x41/0x60 kasan_report.cold.14+0x7b/0xd4 __asan_report_load_n_noabort+0xf/0x20 ibft_init+0x134/0xc49 do_one_initcall+0xc4/0x3e0 kernel_init_freeable+0x5af/0x66b kernel_init+0x16/0x1d0 ret_from_fork+0x22/0x30 ACPI tables mapped via kmap() do not have their mapped pages reserved and the pages can be "stolen" by the buddy allocator. Apparently, on the affected system, the ACPI table in question is not located in "reserved" memory, like ACPI NVS or ACPI Data, that will not be used by the buddy allocator, so the memory occupied by that table has to be explicitly reserved to prevent the buddy allocator from using it. In order to address this problem, rearrange the initialization of the ACPI tables on x86 to locate the initial tables earlier and reserve the memory occupied by them. The other architectures using ACPI should not be affected by this change. Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/ Reported-by: George Kennedy <george.kennedy@oracle.com> Tested-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
2021-03-29	ALSA: usb-audio: Apply sample rate quirk to Logitech Connect	Ikjoon Jang	1	-0/+1
	Logitech ConferenceCam Connect is a compound USB device with UVC and UAC. Not 100% reproducible but sometimes it keeps responding STALL to every control transfer once it receives get_freq request. This patch adds 046d:0x084c to a snd_usb_get_sample_rate_quirk list. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203419 Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210324105153.2322881-1-ikjn@chromium.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-29	drm/exynos/decon5433: Remove the unused include statements	Tian Tao	1	-1/+0
	This driver doesn't reference of_gpio.h, so drop it. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>