2018-10-05ipc/shm.c: use ERR_CAST() for shm_lock() error returnKees Cook1-1/+1
This uses ERR_CAST() instead of an open-coded cast, as it is casting across structure pointers, which upsets __randomize_layout: ipc/shm.c: In function `shm_lock': ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm' return (void *)ipcp; ^~~~~~~~~~~~ Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast Fixes: 82061c57ce93 ("ipc: drop ipc_lock()") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctlYueHaibing1-1/+2
get_user_pages_fast() will return negative value if no pages were pinned, then be converted to a unsigned, which is compared to zero, giving the wrong result. Link: http://lkml.kernel.org/r/20180921095015.26088-1-yuehaibing@huawei.com Fixes: 09e35a4a1ca8 ("mm/gup_benchmark: handle gup failures") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm, thp: fix mlocking THP page with migration enabledKirill A. Shutemov2-1/+4
A transparent huge page is represented by a single entry on an LRU list. Therefore, we can only make unevictable an entire compound page, not individual subpages. If a user tries to mlock() part of a huge page, we want the rest of the page to be reclaimable. We handle this by keeping PTE-mapped huge pages on normal LRU lists: the PMD on border of VM_LOCKED VMA will be split into PTE table. Introduction of THP migration breaks[1] the rules around mlocking THP pages. If we had a single PMD mapping of the page in mlocked VMA, the page will get mlocked, regardless of PTE mappings of the page. For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in remove_migration_pmd(). Anon THP pages can only be shared between processes via fork(). Mlocked page can only be shared if parent mlocked it before forking, otherwise CoW will be triggered on mlock(). For Anon-THP, we can fix the issue by munlocking the page on removing PTE migration entry for the page. PTEs for the page will always come after mlocked PMD: rmap walks VMAs from oldest to newest. Test-case: #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #include <linux/mempolicy.h> #include <numaif.h> int main(void) { unsigned long nodemask = 4; void *addr; addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (fork()) { wait(NULL); return 0; } mlock(addr, 4UL << 10); mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, &nodemask, 4, MPOL_MF_MOVE); return 0; } [1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com Link: http://lkml.kernel.org/r/20180917133816.43995-1-kirill.shutemov@linux.intel.com Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()Larry Chen1-4/+12
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages is dirty. When a page has not been written back, it is still in dirty state. If ocfs2_duplicate_clusters_by_page() is called against the dirty page, the crash happens. To fix this bug, we can just unlock the page and wait until the page until its not dirty. The following is the backtrace: kernel BUG at /root/code/ocfs2/refcounttree.c:2961! [exception RIP: ocfs2_duplicate_clusters_by_page+822] __ocfs2_move_extent+0x80/0x450 [ocfs2] ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] ocfs2_move_extents+0x180/0x3b0 [ocfs2] ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] ocfs2_ioctl+0x253/0x640 [ocfs2] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Once we find the page is dirty, we do not wait until it's clean, rather we use write_one_page() to write it back Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com [lchen@suse.com: update comments] Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Larry Chen <lchen@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05hugetlb: take PMD sharing into account when flushing tlb/cachesMike Kravetz1-9/+44
When fixing an issue with PMD sharing and migration, it was discovered via code inspection that other callers of huge_pmd_unshare potentially have an issue with cache and tlb flushing. Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst case ranges for mmu notifiers. Ensure that this range is flushed if huge_pmd_unshare succeeds and unmaps a PUD_SUZE area. Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm: migration: fix migration of huge PMD shared pagesMike Kravetz4-5/+94
The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05Merge tag 'iommu-fixes-v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuGreg Kroah-Hartman1-1/+1
Joerg writes: "IOMMU Fix for Linux v4.19-rc6 One important fix: - Fix a memory leak with AMD IOMMU when SME is active and a VM has assigned devices. In that case the complete guest memory will be leaked without this fix." * tag 'iommu-fixes-v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Clear memory encryption mask from physical address
2018-10-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman5-67/+108
Paolo writes: "KVM changes for 4.19-rc7 x86 and PPC bugfixes, mostly introduced in 4.19-rc1." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: fix entry with pending interrupt if APICv is enabled KVM: VMX: hide flexpriority from guest when disabled at the module level KVM: VMX: check for existence of secondary exec controls before accessing KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault KVM: x86: fix L1TF's MMIO GFN calculation tools/kvm_stat: cut down decimal places in update interval dialog KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled KVM: x86: never trap MSR_KERNEL_GS_BASE
2018-10-05Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Greg Kroah-Hartman10-55/+76
Herbert writes: "Crypto Fixes for 4.19 This push fixes the following issues: - Out-of-bound stack access in qat. - Illegal schedule in mxs-dcp. - Memory corruption in chelsio. - Incorrect pointer computation in caam." * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: qat - Fix KASAN stack-out-of-bounds bug in adf_probe() crypto: mxs-dcp - Fix wait logic on chan threads crypto: chelsio - Fix memory corruption in DMA Mapped buffers. crypto: caam/jr - fix ablkcipher_edesc pointer arithmetic
2018-10-05Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Greg Kroah-Hartman4-6/+31
Steve writes: "SMB3 fixes four small SMB3 fixes: one for stable, the others to address a more recent regression" * tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix lease break problem introduced by compounding cifs: only wake the thread for the very last PDU in a compound cifs: add a warning if we try to to dequeue a deleted mid smb2: fix missing files in root share directory listing
2018-10-05iommu/amd: Clear memory encryption mask from physical addressSingh, Brijesh1-1/+1
Boris Ostrovsky reported a memory leak with device passthrough when SME is active. The VFIO driver uses iommu_iova_to_phys() to get the physical address for an iova. This physical address is later passed into vfio_unmap_unpin() to unpin the memory. The vfio_unmap_unpin() uses pfn_valid() before unpinning the memory. The pfn_valid() check was failing because encryption mask was part of the physical address returned. This resulted in the memory not being unpinned and therefore leaked after the guest terminates. The memory encryption mask must be cleared from the physical address in iommu_iova_to_phys(). Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: <iommu@lists.linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-10-05Merge tag 'kvm-ppc-fixes-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-masterPaolo Bonzini1-0/+10
Third set of PPC KVM fixes for 4.19 One patch here, fixing a potential host crash introduced (or at least exacerbated) by a previous fix for corruption relating to radix guest page faults and THP operations.
2018-10-04Merge tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drmGreg Kroah-Hartman7-26/+75
Dave writes: "amdgpu and two core fixes Two fixes for amdgpu: one corrects a use of process->mm one fix for display code race condition that can result in a crash Two core fixes: One for a use-after-free in the leasing code One for a cma/fbdev crash." * tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drm: drm/amdkfd: Fix incorrect use of process->mm drm/amd/display: Signal hw_done() after waiting for flip_done() drm/cma-helper: Fix crash in fbdev error path drm: fix use-after-free read in drm_mode_create_lease_ioctl()
2018-10-05Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixesDave Airlie2-10/+37
- Fix an ordering issue in DC with respect to atomic flips that could result in a crash - Fix incorrect use of process->mm in KFD Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/1538668374-22334-1-git-send-email-alexander.deucher@amd.com
2018-10-05Merge tag 'drm-misc-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixesDave Airlie5-16/+38
drm-misc-fixes for v4.19-rc7: - Fix use-after-free in drm_mode_create_lease_ioctl() - Fix crash in fbdev error path. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/13b2c3ac-9a96-710e-ceb9-890af164f10e@linux.intel.com
2018-10-04Merge tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsGreg Kroah-Hartman10-24/+30
Miklos writes: "overlayfs fixes for 4.19-rc7 This update fixes a couple of regressions in the stacked file update added in this cycle, as well as some older bugs uncovered by syzkaller. There's also one trivial naming change that touches other parts of the fs subsystem." * tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix format of setxattr debug ovl: fix access beyond unterminated strings ovl: make symbol 'ovl_aops' static vfs: swap names of {do,vfs}_clone_file_range() ovl: fix freeze protection bypass in ovl_clone_file_range() ovl: fix freeze protection bypass in ovl_write_iter() ovl: fix memory leak on unlink of indexed file
2018-10-04Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armGreg Kroah-Hartman2-1/+2
Russell writes: "A couple of small ARM fixes from Stefan and Thomas: - Adding the io_pgetevents syscall - Fixing a bounds check in pci_ioremap_io()" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8799/1: mm: fix pci_ioremap_io() offset check ARM: 8787/1: wire up io_pgetevents syscall
2018-10-04Merge tag 'drm-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drmGreg Kroah-Hartman6-77/+85
Dave writes: "drm exynos, tda9950 and intel fixes 3 i915 fixes: compressed error handling zlib fix compiler warning cleanup and a minor code cleanup 2 tda9950: Two fixes for the HDMI CEC 1 exynos: A fix required for IOMMU interaction." * tag 'drm-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drm: drm/i915: Handle incomplete Z_FINISH for compressed error states drm/i915: Avoid compiler warning for maybe unused gu_misc_iir drm/i915: Do not redefine the has_csr parameter. drm/exynos: Use selected dma_dev default iommu domain instead of a fake one drm/i2c: tda9950: set MAX_RETRIES for errors only drm/i2c: tda9950: fix timeout counter check
2018-10-04Merge tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxGreg Kroah-Hartman18-264/+256
Dave writes: "XFS fixes for 4.19-rc6 Accumlated regression and bug fixes for 4.19-rc6, including: o make iomap correctly mark dirty pages for sub-page block sizes o fix regression in handling extent-to-btree format conversion errors o fix torn log wrap detection for new logs o various corrupt inode detection fixes o various delalloc state fixes o cleanup all the missed transaction cancel cases missed from changes merged in 4.19-rc1 o fix lockdep false positive on transaction allocation o fix locking and reference counting on buffer log items" * tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix error handling in xfs_bmap_extents_to_btree iomap: set page dirty after partial delalloc on mkwrite xfs: remove invalid log recovery first/last cycle check xfs: validate inode di_forkoff xfs: skip delalloc COW blocks in xfs_reflink_end_cow xfs: don't treat unknown di_flags2 as corruption in scrub xfs: remove duplicated include from alloc.c xfs: don't bring in extents in xfs_bmap_punch_delalloc_range xfs: fix transaction leak in xfs_reflink_allocate_cow() xfs: avoid lockdep false positives in xfs_trans_alloc xfs: refactor xfs_buf_log_item reference count handling xfs: clean up xfs_trans_brelse() xfs: don't unlock invalidated buf on aborted tx commit xfs: remove last of unnecessary xfs_defer_cancel() callers xfs: don't crash the vfs on a garbage inline symlink
2018-10-04Merge tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linuxGreg Kroah-Hartman1-1/+1
Palmer writes: "A Single RISC-V Fix for 4.19-rc7 This tag contains a single patch that managed to get lost in the shuffle, which explains why it's so late. This single line has been floating around in various patch sets for months, and fixes our DMA32 region." * tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISCV: Fix end PFN for low memory
2018-10-04drm/amdkfd: Fix incorrect use of process->mmFelix Kuehling1-8/+29
This mm_struct pointer should never be dereferenced. If running in a user thread, just use current->mm. If running in a kernel worker use get_task_mm to get a safe reference to the mm_struct. Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-04drm/amd/display: Signal hw_done() after waiting for flip_done()Shirish S1-2/+8
In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before we signal hw_done(). [Why] This is to temporarily address a paging error that occurs when a nonblocking commit contends with another commit, particularly in a mirrored display configuration where at least 2 CRTCs are updated. The error occurs in drm_atomic_helper_wait_for_flip_done(), when we attempt to access the contents of new_crtc_state->commit. Here's the sequence for a mirrored 2 display setup (irrelevant steps left out for clarity): **THREAD 1** | **THREAD 2** | Initialize atomic state for flip | | Queue worker | ... | Do work for flip | | Signal hw_done() on CRTC 1 | Signal hw_done() on CRTC 2 | | Wait for flip_done() on CRTC 1 <---- **PREEMPTED BY THREAD 1** Initialize atomic state for cursor | update (1) | | Do cursor update work on both CRTCs | | Clear atomic state (2) | **DONE** | ... | | Wait for flip_done() on CRTC 2 | *ERROR* | The issue starts with (1). When the atomic state is initialized, the current CRTC states are duplicated to be the new_crtc_states, and referenced to be the old_crtc_states. (The new_crtc_states are to be filled with update data.) Some things to note: * Due to the mirrored configuration, the cursor updates on both CRTCs. * At this point, the pflip IRQ has already been handled, and flip_done signaled on all CRTCs. The cursor commit can therefore continue. * The old_crtc_states used by the cursor update are the **same states** as the new_crtc_states used by the flip worker. At (2), the old_crtc_state is freed (*), and the cursor commit completes. We then context switch back to the flip worker, where we attempt to access the new_crtc_state->commit object. This is problematic, as this state has already been freed. (*) Technically, 'state->crtcs[i].state' is freed, which was made to reference old_crtc_state in drm_atomic_helper_swap_state() [How] By moving hw_done() after wait_for_flip_done(), we're guaranteed that the new_crtc_state (from the flip worker's perspective) still exists. This is because any other commit will be blocked, waiting for the hw_done() signal. Note that both the i915 and imx drivers have this sequence flipped already, masking this problem. Signed-off-by: Shirish S <shirish.s@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-04kvm: nVMX: fix entry with pending interrupt if APICv is enabledPaolo Bonzini1-12/+26
Commit b5861e5cf2fcf83031ea3e26b0a69d887adf7d21 introduced a check on the interrupt-window and NMI-window CPU execution controls in order to inject an external interrupt vmexit before the first guest instruction executes. However, when APIC virtualization is enabled the host does not need a vmexit in order to inject an interrupt at the next interrupt window; instead, it just places the interrupt vector in RVI and the processor will inject it as soon as possible. Therefore, on machines with APICv it is not enough to check the CPU execution controls: the same scenario can also happen if RVI>vPPR. Fixes: b5861e5cf2fcf83031ea3e26b0a69d887adf7d21 Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04ovl: fix format of setxattr debugMiklos Szeredi1-2/+2
Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12
2018-10-04ovl: fix access beyond unterminated stringsAmir Goldstein1-1/+1
KASAN detected slab-out-of-bounds access in printk from overlayfs, because string format used %*s instead of %.*s. > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604 > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811 > > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36 ... > printk+0xa7/0xcf kernel/printk/printk.c:1996 > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689 Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13
2018-10-04KVM: VMX: hide flexpriority from guest when disabled at the module levelPaolo Bonzini1-1/+5
As of commit 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0, whereas previously KVM would allow a nested guest to enable VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware. That is, KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't (always) allow setting it when kvm-intel.flexpriority=0, and may even initially allow the control and then clear it when the nested guest writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause functional issues. Hide the control completely when the module parameter is cleared. reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04KVM: VMX: check for existence of secondary exec controls before accessingSean Christopherson1-3/+4
Return early from vmx_set_virtual_apic_mode() if the processor doesn't support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of which reside in SECONDARY_VM_EXEC_CONTROL. This eliminates warnings due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing on processors without secondary exec controls. Remove the similar check for TPR shadowing as it is incorporated in the flexpriority_enabled check and the APIC-related code in vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE. Reported-by: Gerhard Wiesinger <redhat@wiesinger.com> Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page faultPaul Mackerras1-0/+10
Commit 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size", 2018-09-11) added a call to __find_linux_pte() and a dereference of the returned PTE pointer to the radix page fault path in the common case where the page is normal system memory. Previously, __find_linux_pte() was only called for mappings to physical addresses which don't have a page struct (e.g. memory-mapped I/O) or where the page struct is marked as reserved memory. This exposes us to the possibility that the returned PTE pointer could be NULL, for example in the case of a concurrent THP collapse operation. Dereferencing the returned NULL pointer causes a host crash. To fix this, we check for NULL, and if it is NULL, we retry the operation by returning to the guest, with the expectation that it will generate the same page fault again (unless of course it has been fixed up by another CPU in the meantime). Fixes: 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-04Merge branch 'drm-tda9950-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-fixesDave Airlie1-2/+3
two tda9950 fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Russell King <rmk@armlinux.org.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181001162948.GA9508@rmk-PC.armlinux.org.uk
2018-10-04Merge tag 'drm-intel-fixes-2018-10-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixesDave Airlie4-47/+76
There's one fix for our zlib incomlete Z_FINISH on our error state handling, plus a compilation warning fix and a tiny code clean up. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181003202840.GA23560@intel.com
2018-10-03Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman129-806/+864
David writes: "Networking fixes: 1) Prefix length validation in xfrm layer, from Steffen Klassert. 2) TX status reporting fix in mac80211, from Andrei Otcheretianski. 3) Fix hangs due to TX_DROP in mac80211, from Bob Copeland. 4) Fix DMA error regression in b43, from Larry Finger. 5) Add input validation to xenvif_set_hash_mapping(), from Jan Beulich. 6) SMMU unmapping fix in hns driver, from Yunsheng Lin. 7) Bluetooh crash in unpairing on SMP, from Matias Karhumaa. 8) WoL handling fixes in the phy layer, from Heiner Kallweit. 9) Fix deadlock in bonding, from Mahesh Bandewar. 10) Fill ttl inherit infor in vxlan driver, from Hangbin Liu. 11) Fix TX timeouts during netpoll, from Michael Chan. 12) RXRPC layer fixes from David Howells. 13) Another batch of ndo_poll_controller() removals to deal with excessive resource consumption during load. From Eric Dumazet. 14) Fix a specific TIPC failure secnario, from LUU Duc Canh. 15) Really disable clocks in r8169 during suspend so that low power states can actually be reached. 16) Fix SYN backlog lockdep issue in tcp and dccp, from Eric Dumazet. 17) Fix RCU locking in netpoll SKB send, which shows up in bonding, from Dave Jones. 18) Fix TX stalls in r8169, from Heiner Kallweit. 19) Fix locksup in nfp due to control message storms, from Jakub Kicinski. 20) Various rmnet bug fixes from Subash Abhinov Kasiviswanathan and Sean Tranchetti. 21) Fix use after free in ip_cmsg_recv_dstaddr(), from Eric Dumazet." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (122 commits) ixgbe: check return value of napi_complete_done() sctp: fix fall-through annotation r8169: always autoneg on resume ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() net: qualcomm: rmnet: Fix incorrect allocation flag in receive path net: qualcomm: rmnet: Fix incorrect allocation flag in transmit net: qualcomm: rmnet: Skip processing loopback packets net: systemport: Fix wake-up interrupt race during resume rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 bonding: fix warning message inet: make sure to grab rcu_read_lock before using ireq->ireq_opt nfp: avoid soft lockups under control message storm declance: Fix continuation with the adapter identification message net: fec: fix rare tx timeout r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO tun: napi flags belong to tfile tun: initialize napi_mutex unconditionally tun: remove unused parameters bond: take rcu lock in netpoll_send_skb_on_dev rtnetlink: Fail dump if target netnsid is invalid ...
2018-10-03ixgbe: check return value of napi_complete_done()Song Liu1-5/+7
The NIC driver should only enable interrupts when napi_complete_done() returns true. This patch adds the check for ixgbe. Cc: stable@vger.kernel.org # 4.10+ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03Merge tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftestGreg Kroah-Hartman1-9/+10
Shuah writes: "kselftest fixes for 4.19-rc7 This fixes update for 4.19-rc7 consists one fix to rseq test to prevent it from seg-faulting when compiled with -fpie." * tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: rseq/selftests: fix parametrized test with -fpie
2018-10-03sctp: fix fall-through annotationGustavo A. R. Silva1-1/+1
Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03drm/i915: Handle incomplete Z_FINISH for compressed error statesChris Wilson2-25/+64
The final call to zlib_deflate(Z_FINISH) may require more output space to be allocated and so needs to re-invoked. Failure to do so in the current code leads to incomplete zlib streams (albeit intact due to the use of Z_SYNC_FLUSH) resulting in the occasional short object capture. v2: Check against overrunning our pre-allocated page array v3: Drop Z_SYNC_FLUSH entirely Testcase: igt/i915-error-capture.js Fixes: 0a97015d45ee ("drm/i915: Compress GPU objects in error state") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181003082422.23214-1-chris@chris-wilson.co.uk (cherry picked from commit 83bc0f5b432f60394466deef16fc753e27371d0b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-10-03Merge tag 'media/v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaGreg Kroah-Hartman3-18/+26
Mauro writes: "media fixes for v4.19-rc6" * tag 'media/v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: v4l: event: Prevent freeing event subscriptions while accessed
2018-10-03Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hidGreg Kroah-Hartman4-18/+12
Jiri writes: "HID fixes: - hantick touchpad fix from Anisse Astier - device ID addition for Ice Lake mobile from Srinivas Pandruvada - touchscreen resume fix for certain i2c-hid driven devices from Hans de Goede" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: intel-ish-hid: Enable Ice Lake mobile HID: i2c-hid: Remove RESEND_REPORT_DESCR quirk and its handling HID: i2c-hid: disable runtime PM operations on hantick touchpad
2018-10-03Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsGreg Kroah-Hartman1-11/+13
Al writes: "xattrs regression fix from Andreas; sat in -next for quite a while." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sysfs: Do not return POSIX ACL xattrs via listxattr
2018-10-03media: v4l: event: Prevent freeing event subscriptions while accessedSakari Ailus3-18/+26
The event subscriptions are added to the subscribed event list while holding a spinlock, but that lock is subsequently released while still accessing the subscription object. This makes it possible to unsubscribe the event --- and freeing the subscription object's memory --- while the subscription object is simultaneously accessed. Prevent this by adding a mutex to serialise the event subscription and unsubscription. This also gives a guarantee to the callback ops that the add op has returned before the del op is called. This change also results in making the elems field less special: subscriptions are only added to the event list once they are fully initialised. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: stable@vger.kernel.org # for 4.14 and up Fixes: c3b5b0241f62 ("V4L/DVB: V4L: Events: Add backend") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-10-03Merge tag 'exynos-drm-fixes-for-v4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixesDave Airlie1-28/+6
Use default iommu domain instead of fake one - This patch makes it to reuse default IOMMU domain instead of allocating a fake IOMMU domain, and allows some design changes for enhancement of IOMMU framework[1] without breaking Exynos DRM. [1] https://www.spinics.net/lists/arm-kernel/msg676098.html Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/1538360696-23579-1-git-send-email-inki.dae@samsung.com
2018-10-02r8169: always autoneg on resumeAlex Xu (Hello71)1-4/+3
This affects at least versions 25 and 33, so assume all cards are broken and just renegotiate by default. Fixes: 10bc6a6042c9 ("r8169: fix autoneg issue on resume with RTL8168E") Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()Eric Dumazet1-2/+1
Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy, do not do it. Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Merge tag 'mlx5-fixes-2018-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller7-5/+76
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-01 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: "6e0a4a23c59a ('net/mlx5: E-Switch, Fix out of bound access when setting vport rate')" For -stable v4.18: "98d6627c372a ('net/mlx5e: Set vlan masks for all offloaded TC rules')" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Merge branch 'rmnet-fixes'David S. Miller1-2/+5
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Updates 2018-10-02 This series is a set of small fixes for rmnet driver Patch 1 is a fix for a scenario reported by syzkaller Patch 2 & 3 are fixes for incorrect allocation flags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Fix incorrect allocation flag in receive pathSubash Abhinov Kasiviswanathan1-1/+1
The incoming skb needs to be reallocated in case the headroom is not sufficient to adjust the ethernet header. This allocation needs to be atomic otherwise it results in this splat [<600601bb>] ___might_sleep+0x185/0x1a3 [<603f6314>] ? _raw_spin_unlock_irqrestore+0x0/0x27 [<60069bb0>] ? __wake_up_common_lock+0x95/0xd1 [<600602b0>] __might_sleep+0xd7/0xe2 [<60065598>] ? enqueue_task_fair+0x112/0x209 [<600eea13>] __kmalloc_track_caller+0x5d/0x124 [<600ee9b6>] ? __kmalloc_track_caller+0x0/0x124 [<602696d5>] __kmalloc_reserve.isra.34+0x30/0x7e [<603f629b>] ? _raw_spin_lock_irqsave+0x0/0x3d [<6026b744>] pskb_expand_head+0xbf/0x310 [<6025ca6a>] rmnet_rx_handler+0x7e/0x16b [<6025c9ec>] ? rmnet_rx_handler+0x0/0x16b [<6027ad0c>] __netif_receive_skb_core+0x301/0x96f [<60033c17>] ? set_signals+0x0/0x40 [<6027bbcb>] __netif_receive_skb+0x24/0x8e Fixes: 74692caf1b0b ("net: qualcomm: rmnet: Process packets over ethernet") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Fix incorrect allocation flag in transmitSubash Abhinov Kasiviswanathan1-1/+1
The incoming skb needs to be reallocated in case the headroom is not sufficient to add the MAP header. This allocation needs to be atomic otherwise it results in the following splat [32805.801456] BUG: sleeping function called from invalid context [32805.841141] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [32805.904773] task: ffffffd7c5f62280 task.stack: ffffff80464a8000 [32805.910851] pc : ___might_sleep+0x180/0x188 [32805.915143] lr : ___might_sleep+0x180/0x188 [32806.131520] Call trace: [32806.134041] ___might_sleep+0x180/0x188 [32806.137980] __might_sleep+0x50/0x84 [32806.141653] __kmalloc_track_caller+0x80/0x3bc [32806.146215] __kmalloc_reserve+0x3c/0x88 [32806.150241] pskb_expand_head+0x74/0x288 [32806.154269] rmnet_egress_handler+0xb0/0x1d8 [32806.162239] rmnet_vnd_start_xmit+0xc8/0x13c [32806.166627] dev_hard_start_xmit+0x148/0x280 [32806.181181] sch_direct_xmit+0xa4/0x198 [32806.185125] __qdisc_run+0x1f8/0x310 [32806.188803] net_tx_action+0x23c/0x26c [32806.192655] __do_softirq+0x220/0x408 [32806.196420] do_softirq+0x4c/0x70 Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Skip processing loopback packetsSean Tranchetti1-0/+3
RMNET RX handler was processing invalid packets that were originally sent on the real device and were looped back via dev_loopback_xmit(). This was detected using syzkaller. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: systemport: Fix wake-up interrupt race during resumeFlorian Fainelli1-17/+11
The AON_PM_L2 is normally used to trigger and identify the source of a wake-up event. Since the RX_SYS clock is no longer turned off, we also have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and that interrupt remains active up until the magic packet detector is disabled which happens much later during the driver resumption. The race happens if we have a CPU that is entering the SYSTEMPORT INTRL2_0 handler during resume, and another CPU has managed to clear the wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we have the first CPU stuck in the interrupt handler with an interrupt cause that has been cleared under its feet, and so we keep returning IRQ_NONE and we never make any progress. This was not a problem before because we would always turn off the RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned off as well, thus not latching the interrupt. The fix is to make sure we do not enable either the MPD or BRCM_TAG_MATCH interrupts since those are redundant with what the AON_PM_L2 interrupt controller already processes and they would cause such a race to occur. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02smb3: fix lease break problem introduced by compoundingSteve French1-2/+1
Fixes problem (discovered by Aurelien) introduced by recent commit: commit b24df3e30cbf48255db866720fb71f14bf9d2f39 ("cifs: update receive_encrypted_standard to handle compounded responses") which broke the ability to respond to some lease breaks (lease breaks being ignored is a problem since can block server response for duration of the lease break timeout). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02cifs: only wake the thread for the very last PDU in a compoundRonnie Sahlberg1-1/+17
For compounded PDUs we whould only wake the waiting thread for the very last PDU of the compound. We do this so that we are guaranteed that the demultiplex_thread will not process or access any of those MIDs any more once the send/recv thread starts processing. Else there is a race where at the end of the send/recv processing we will try to delete all the mids of the compound. If the multiplex thread still has other mids to process at this point for this compound this can lead to an oops. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>