aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-08-21mm: include CMA pages in lowmem_reserve at bootDoug Berger1-1/+1
The lowmem_reserve arrays provide a means of applying pressure against allocations from lower zones that were targeted at higher zones. Its values are a function of the number of pages managed by higher zones and are assigned by a call to the setup_per_zone_lowmem_reserve() function. The function is initially called at boot time by the function init_per_zone_wmark_min() and may be called later by accesses of the /proc/sys/vm/lowmem_reserve_ratio sysctl file. The function init_per_zone_wmark_min() was moved up from a module_init to a core_initcall to resolve a sequencing issue with khugepaged. Unfortunately this created a sequencing issue with CMA page accounting. The CMA pages are added to the managed page count of a zone when cma_init_reserved_areas() is called at boot also as a core_initcall. This makes it uncertain whether the CMA pages will be added to the managed page counts of their zones before or after the call to init_per_zone_wmark_min() as it becomes dependent on link order. With the current link order the pages are added to the managed count after the lowmem_reserve arrays are initialized at boot. This means the lowmem_reserve values at boot may be lower than the values used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the ratio values are unchanged. In many cases the difference is not significant, but for example an ARM platform with 1GB of memory and the following memory layout cma: Reserved 256 MiB at 0x0000000030000000 Zone ranges: DMA [mem 0x0000000000000000-0x000000002fffffff] Normal empty HighMem [mem 0x0000000030000000-0x000000003fffffff] would result in 0 lowmem_reserve for the DMA zone. This would allow userspace to deplete the DMA zone easily. Funnily enough $ cat /proc/sys/vm/lowmem_reserve_ratio would fix up the situation because as a side effect it forces setup_per_zone_lowmem_reserve. This commit breaks the link order dependency by invoking init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages have the chance to be properly accounted in their zone(s) and allowing the lowmem_reserve arrays to receive consistent values. Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jason Baron <jbaron@akamai.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21squashfs: avoid bio_alloc() failure with 1Mbyte blocksPhillip Lougher1-1/+5
This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Bio_alloc() is limited to 256 pages (1 Mbyte). This can cause a failure when reading 1 Mbyte block filesystems. The problem is a datablock can be fully (or almost uncompressed), requiring 256 pages, but, because blocks are not aligned to page boundaries, it may require 257 pages to read. Bio_kmalloc() can handle 1024 pages, and so use this for the edge condition. Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: Nicolas Prochazka <nicolas.prochazka@gmail.com> Reported-by: Tomoatsu Shimada <shimada@walbrix.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Cc: Philippe Liard <pliard@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Adrien Schildknecht <adrien+dev@schischi.me> Cc: Daniel Rosenberg <drosen@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21uprobes: __replace_page() avoid BUG in munlock_vma_page()Hugh Dickins1-1/+1
syzbot crashed on the VM_BUG_ON_PAGE(PageTail) in munlock_vma_page(), when called from uprobes __replace_page(). Which of many ways to fix it? Settled on not calling when PageCompound (since Head and Tail are equals in this context, PageCompound the usual check in uprobes.c, and the prior use of FOLL_SPLIT_PMD will have cleared PageMlocked already). Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008161338360.20413@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21kernel/relay.c: fix memleak on destroy relay channelWei Yongjun1-0/+1
kmemleak report memory leak as follows: unreferenced object 0x607ee4e5f948 (size 8): comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: relay_open kernel/relay.c:583 [inline] relay_open+0xb6/0x970 kernel/relay.c:563 do_blk_trace_setup+0x4a8/0xb20 kernel/trace/blktrace.c:557 __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597 blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738 blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613 block_ioctl+0xe5/0x120 fs/block_dev.c:1871 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 'chan->buf' is malloced in relay_open() by alloc_percpu() but not free while destroy the relay channel. Fix it by adding free_percpu() before return from relay_destroy_channel(). Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel buffer pointers") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Akash Goel <akash.goel@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200817122826.48518-1-weiyongjun1@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21romfs: fix uninitialized memory leak in romfs_dev_read()Jann Horn1-3/+1
romfs has a superblock field that limits the size of the filesystem; data beyond that limit is never accessed. romfs_dev_read() fetches a caller-supplied number of bytes from the backing device. It returns 0 on success or an error code on failure; therefore, its API can't represent short reads, it's all-or-nothing. However, when romfs_dev_read() detects that the requested operation would cross the filesystem size limit, it currently silently truncates the requested number of bytes. This e.g. means that when the content of a file with size 0x1000 starts one byte before the filesystem size limit, ->readpage() will only fill a single byte of the supplied page while leaving the rest uninitialized, leaking that uninitialized memory to userspace. Fix it by returning an error code instead of truncating the read when the requested read operation would go beyond the end of the filesystem. Fixes: da4458bda237 ("NOMMU: Make it possible for RomFS to use MTD devices directly") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21mm/rodata_test.c: fix missing function declarationLeon Romanovsky1-0/+1
The compilation with CONFIG_DEBUG_RODATA_TEST set produces the following warning due to the missing include. mm/rodata_test.c:15:6: warning: no previous prototype for 'rodata_test' [-Wmissing-prototypes] 15 | void rodata_test(void) | ^~~~~~~~~~~ Fixes: 2959a5f726f6 ("mm: add arch-independent testcases for RODATA") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lkml.kernel.org/r/20200819080026.918134-1-leon@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21mm/vunmap: add cond_resched() in vunmap_pmd_rangeAneesh Kumar K.V1-0/+2
Like zap_pte_range add cond_resched so that we can avoid softlockups as reported below. On non-preemptible kernel with large I/O map region (like the one we get when using persistent memory with sector mode), an unmap of the namespace can report below softlockups. 22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777] NIP [c0000000000dc224] plpar_hcall+0x38/0x58 LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: flush_hash_page+0x114/0x200 hpte_need_flush+0x2dc/0x540 vunmap_page_range+0x538/0x6f0 free_unmap_vmap_area+0x30/0x70 remove_vm_area+0xfc/0x140 __vunmap+0x68/0x270 __iounmap.part.0+0x34/0x60 memunmap+0x54/0x70 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd8/0x260 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Reported-by: Harish Sriram <harish@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()Hugh Dickins1-1/+1
syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in __khugepaged_enter(): yes, when one thread is about to dump core, has set core_state, and is waiting for others, another might do something calling __khugepaged_enter(), which now crashes because I lumped the core_state test (known as "mmget_still_valid") into khugepaged_test_exit(). I still think it's best to lump them together, so just in this exceptional case, check mm->mm_users directly instead of khugepaged_test_exit(). Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yang Shi <shy828301@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> [4.8+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21hugetlb_cgroup: convert comma to semicolonXu Wang1-2/+2
Replace a comma between expression statements by a semicolon. Fixes: faced7e0806cf4 ("mm: hugetlb controller for cgroups v2") Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Giuseppe Scrivano <gscrivan@redhat.com> Link: http://lkml.kernel.org/r/20200818064333.21759-1-vulab@iscas.ac.cn Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21mailmap: add Andi KleenNick Desaulniers1-0/+1
I keep getting bounce back from the suse.de address. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Quentin Perret <qperret@qperret.net> Link: http://lkml.kernel.org/r/20200818203214.659955-1-ndesaulniers@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19lib/string.c: Use freestanding environmentArvind Sankar1-1/+6
gcc can transform the loop in a naive implementation of memset/memcpy etc into a call to the function itself. This optimization is enabled by -ftree-loop-distribute-patterns. This has been the case for a while, but gcc-10.x enables this option at -O2 rather than -O3 as in previous versions. Add -ffreestanding, which implicitly disables this optimization with gcc. It is unclear whether clang performs such optimizations, but hopefully it will also not do so in a freestanding environment. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19x86/boot/compressed: Use builtin mem functions for decompressorArvind Sankar2-9/+3
Since commits c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions") fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c") the decompressor stub has been using the compiler's builtin memcpy, memset and memcmp functions, _except_ where it would likely have the largest impact, in the decompression code itself. Remove the #undef's of memcpy and memset in misc.c so that the decompressor code also uses the compiler builtins. The rationale given in the comment doesn't really apply: just because some functions use the out-of-line version is no reason to not use the builtin version in the rest. Replace the comment with an explanation of why memzero and memmove are being #define'd. Drop the suggestion to #undef in boot/string.h as well: the out-of-line versions are not really optimized versions, they're generic code that's good enough for the preboot environment. The compiler will likely generate better code for constant-size memcpy/memset/memcmp if it is allowed to. Most decompressors' performance is unchanged, with the exception of LZ4 and 64-bit ZSTD. Before After ARCH LZ4 73ms 10ms 32 LZ4 120ms 10ms 64 ZSTD 90ms 74ms 64 Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels. Decompressor code size has small differences, with the largest being that 64-bit ZSTD decreases just over 2k. The largest code size increase was on 64-bit XZ, of about 400 bytes. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Nick Terrell <nickrterrell@gmail.com> Tested-by: Nick Terrell <nickrterrell@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-18mm/memory.c: skip spurious TLB flush for retried page faultYang Shi1-0/+3
Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed then just fall through to spurious TLB flush. The regression is caused by the excessive spurious TLB flush. It is fine on x86 since x86's spurious TLB flush is no-op. We could just skip the spurious TLB flush to mitigate the regression. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Xu Yu <xuyu@linux.alibaba.com> Debugged-by: Xu Yu <xuyu@linux.alibaba.com> Tested-by: Xu Yu <xuyu@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-17otx2_common: Use devm_kcalloc() in otx2_config_npa()Xu Wang1-2/+2
A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "devm_kcalloc". Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17net: qrtr: fix usage of idr in port assignment to socketNecip Fazil Yildiran1-9/+11
Passing large uint32 sockaddr_qrtr.port numbers for port allocation triggers a warning within idr_alloc() since the port number is cast to int, and thus interpreted as a negative number. This leads to the rejection of such valid port numbers in qrtr_port_assign() as idr_alloc() fails. To avoid the problem, switch to idr_alloc_u32() instead. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-by: syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com Signed-off-by: Necip Fazil Yildiran <necip@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17selftests: disable rp_filter for icmp_redirect.shDavid Ahern1-0/+2
h1 is initially configured to reach h2 via r1 rather than the more direct path through r2. If rp_filter is set and inherited for r2, forwarding fails since the source address of h1 is reachable from eth0 vs the packet coming to it via r1 and eth1. Since rp_filter setting affects the test, explicitly reset it. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17mailmap: Add WeiXiong LiaoKees Cook1-0/+1
WeiXiong Liao noted to me offlist that his preference for email address had changed and that he'd like it updated in the mailmap so people discussing pstore/blk would be able to reach him. Cc: WeiXiong Liao <gmpy.liaowx@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-17mailmap: Restore dictionary sortingKees Cook1-57/+57
Several names had been recently appended (instead of inserted). While git-shortlog doesn't need this file to be sorted, it helps humans to keep it organized this way. Sort the entire file (which includes some minor shuffling for dictionary order). Done with the following commands: grep -E '^(#|$)' .mailmap > .mailmap.head grep -Ev '^(#|$)' .mailmap > .mailmap.body sort -f .mailmap.body > .mailmap.body.sort cat .mailmap.head .mailmap.body.sort > .mailmap rm .mailmap.head .mailmap.body.sort Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-17arch/ia64: Restore arch-specific pgd_offset_k implementationJessica Clarke2-0/+11
IA-64 is special and treats pgd_offset_k() differently to pgd_offset(), using different formulae to calculate the indices into the kernel and user PGDs. The index into the user PGDs takes into account the region number, but the index into the kernel (init_mm) PGD always assumes a predefined kernel region number. Commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which incorrectly used pgd_index() for kernel page tables. As a result, the index into the kernel PGD was going out of bounds and the kernel hung during early boot. Allow overrides of pgd_offset_k() and override it on IA-64 with the old implementation that will correctly index the kernel PGD. Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-08-17Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"David S. Miller1-1/+0
This reverts commit f8414a8d886b613b90d9fdf7cda6feea313b1069. eth_type_trans() does the necessary pull on the skb. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17phylink: <linux/phylink.h>: fix function prototype kernel-doc warningRandy Dunlap1-1/+2
Fix a kernel-doc warning for the pcs_config() function prototype: ../include/linux/phylink.h:406: warning: Excess function parameter 'permit_pause_to_mac' description in 'pcs_config' Fixes: 7137e18f6f88 ("net: phylink: add struct phylink_pcs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17vfio/type1: Add proper error unwind for vfio_iommu_replay()Alex Williamson1-5/+66
The vfio_iommu_replay() function does not currently unwind on error, yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma structure to indicate IOMMU mapping. The IOMMU mappings are torn down when the domain is destroyed, but the other actions go on to cause trouble later. For example, the iommu->domain_list can be empty if we only have a non-IOMMU backed mdev attached. We don't currently check if the list is empty before getting the first entry in the list, which leads to a bogus domain pointer. If a vfio_dma entry is erroneously marked as iommu_mapped, we'll attempt to use that bogus pointer to retrieve the existing physical page addresses. This is the scenario that uncovered this issue, attempting to hot-add a vfio-pci device to a container with an existing mdev device and DMA mappings, one of which could not be pinned, causing a failure adding the new group to the existing container and setting the conditions for a subsequent attempt to explode. To resolve this, we can first check if the domain_list is empty so that we can reject replay of a bogus domain, should we ever encounter this inconsistent state again in the future. The real fix though is to add the necessary unwind support, which means cleaning up the current pinning if an IOMMU mapping fails, then walking back through the r-b tree of DMA entries, reading from the IOMMU which ranges are mapped, and unmapping and unpinning those ranges. To be able to do this, we also defer marking the DMA entry as IOMMU mapped until all entries are processed, in order to allow the unwind to know the disposition of each entry. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17vfio-pci: Avoid recursive read-lock usageAlex Williamson2-24/+98
A down_read on memory_lock is held when performing read/write accesses to MMIO BAR space, including across the copy_to/from_user() callouts which may fault. If the user buffer for these copies resides in an mmap of device MMIO space, the mmap fault handler will acquire a recursive read-lock on memory_lock. Avoid this by reducing the lock granularity. Sequential accesses requiring multiple ioread/iowrite cycles are expected to be rare, therefore typical accesses should not see additional overhead. VGA MMIO accesses are expected to be non-fatal regardless of the PCI memory enable bit to allow legacy probing, this behavior remains with a comment added. ioeventfds are now included in memory access testing, with writes dropped while memory space is disabled. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17watch_queue: Limit the number of watches a user can holdDavid Howells2-0/+11
Impose a limit on the number of watches that a user can hold so that they can't use this mechanism to fill up all the available memory. This is done by putting a counter in user_struct that's incremented when a watch is allocated and decreased when it is released. If the number exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN. This can be tested by the following means: (1) Create a watch queue and attach it to fd 5 in the program given - in this case, bash: keyctl watch_session /tmp/nlog /tmp/gclog 5 bash (2) In the shell, set the maximum number of files to, say, 99: ulimit -n 99 (3) Add 200 keyrings: for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done (4) Try to watch all of the keyrings: for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done This should fail when the number of watches belonging to the user hits 99. (5) Remove all the keyrings and all of those watches should go away: for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done (6) Kill off the watch queue by exiting the shell spawned by watch_session. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-16mptcp: sendmsg: reset iter on error reduxFlorian Westphal1-3/+5
This fix wasn't correct: When this function is invoked from the retransmission worker, the iterator contains garbage and resetting it causes a crash. As the work queue should not be performance critical also zero the msghdr struct. Fixes: 35759383133f64d "(mptcp: sendmsg: reset iter on error)" Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16net: devlink: Remove overzealous WARN_ON with snapshotsAndrew Lunn1-1/+1
It is possible to trigger this WARN_ON from user space by triggering a devlink snapshot with an ID which already exists. We don't need both -EEXISTS being reported and spamming the kernel log. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16tipc: not enable tipc when ipv6 works as a moduleXin Long1-0/+1
When using ipv6_dev_find() in one module, it requires ipv6 not to work as a module. Otherwise, this error occurs in build: undefined reference to `ipv6_dev_find'. So fix it by adding "depends on IPV6 || IPV6=n" to tipc/Kconfig, as it does in sctp/Kconfig. Fixes: 5a6f6f579178 ("tipc: set ub->ifindex for local ipv6 address") Reported-by: kernel test robot <lkp@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16tipc: fix uninit skb->data in tipc_nl_compat_dumpit()Cong Wang1-1/+11
__tipc_nl_compat_dumpit() has two callers, and it expects them to pass a valid nlmsghdr via arg->data. This header is artificial and crafted just for __tipc_nl_compat_dumpit(). tipc_nl_compat_publ_dump() does so by putting a genlmsghdr as well as some nested attribute, TIPC_NLA_SOCK. But the other caller tipc_nl_compat_dumpit() does not, this leaves arg->data uninitialized on this call path. Fix this by just adding a similar nlmsghdr without any payload in tipc_nl_compat_dumpit(). This bug exists since day 1, but the recent commit 6ea67769ff33 ("net: tipc: prepare attrs in __tipc_nl_compat_dumpit()") makes it easier to appear. Reported-and-tested-by: syzbot+0e7181deafa7e0b79923@syzkaller.appspotmail.com Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat") Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16net: Fix potential wrong skb->protocol in skb_vlan_untag()Miaohe Lin1-2/+2
We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: 0d5501c1c828 ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16net: xdp: pull ethernet header off packet after computing skb->protocolJason A. Donenfeld1-0/+1
When an XDP program changes the ethernet header protocol field, eth_type_trans is used to recalculate skb->protocol. In order for eth_type_trans to work correctly, the ethernet header must actually be part of the skb data segment, so the code first pushes that onto the head of the skb. However, it subsequently forgets to pull it back off, making the behavior of the passed-on packet inconsistent between the protocol modifying case and the static protocol case. This patch fixes the issue by simply pulling the ethernet header back off of the skb head. Fixes: 297249569932 ("net: fix generic XDP to handle if eth header was mangled") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16ipvlan: fix device featuresMahesh Bandewar1-5/+22
Processing NETDEV_FEAT_CHANGE causes IPvlan links to lose NETIF_F_LLTX feature because of the incorrect handling of features in ipvlan_fix_features(). --before-- lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# ethtool -K ipvl0 tso off Cannot change tcp-segmentation-offload Actual changes: vlan-challenged: off [fixed] tx-lockless: off [fixed] lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: off [fixed] lpaa10:~# --after-- lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# ethtool -K ipvl0 tso off Cannot change tcp-segmentation-offload Could not change any device features lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16bonding: fix a potential double-unregisterCong Wang1-1/+2
When we tear down a network namespace, we unregister all the netdevices within it. So we may queue a slave device and a bonding device together in the same unregister queue. If the only slave device is non-ethernet, it would automatically unregister the bonding device as well. Thus, we may end up unregistering the bonding device twice. Workaround this special case by checking reg_state. Fixes: 9b5e383c11b0 ("net: Introduce unregister_netdevice_many()") Reported-by: syzbot+af23e7f3e0a7e10c8b67@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16Linux 5.9-rc1Linus Torvalds1-2/+2
2020-08-16parisc: fix PMD pages allocation by restoring pmd_alloc_one()Mike Rapoport1-0/+6
Commit 1355c31eeb7e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") converted parisc to use generic version of pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for PMD. Restore the original version of pmd_alloc_one() for parisc, just use GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and memset. Fixes: 1355c31eeb7e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.ee Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-15io_uring: short circuit -EAGAIN for blocking read attemptJens Axboe1-1/+4
One case was missed in the short IO retry handling, and that's hitting -EAGAIN on a blocking attempt read (eg from io-wq context). This is a problem on sockets that are marked as non-blocking when created, they don't carry any REQ_F_NOWAIT information to help us terminate them instead of perpetually retrying. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15io_uring: sanitize double poll handlingJens Axboe1-9/+25
There's a bit of confusion on the matching pairs of poll vs double poll, depending on if the request is a pure poll (IORING_OP_POLL_ADD) or poll driven retry. Add io_poll_get_double() that returns the double poll waitqueue, if any, and io_poll_get_single() that returns the original poll waitqueue. With that, remove the argument to io_poll_remove_double(). Finally ensure that wait->private is cleared once the double poll handler has run, so that remove knows it's already been seen. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15can: j1939: add rxtimer for multipacket broadcast sessionZhang Changzhong1-8/+20
According to SAE J1939/21 (Chapter 5.12.3 and APPENDIX C), for transmit side the required time interval between packets of a multipacket broadcast message is 50 to 200 ms, the responder shall use a timeout of 250ms (provides margin allowing for the maximumm spacing of 200ms). For receive side a timeout will occur when a time of greater than 750 ms elapsed between two message packets when more packets were expected. So this patch fix and add rxtimer for multipacket broadcast session. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-5-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-08-15can: j1939: abort multipacket broadcast session when timeout occursZhang Changzhong1-1/+1
If timeout occurs, j1939_tp_rxtimer() first calls hrtimer_start() to restart rxtimer, and then calls __j1939_session_cancel() to set session->state = J1939_SESSION_WAITING_ABORT. At next timeout expiration, because of the J1939_SESSION_WAITING_ABORT session state j1939_tp_rxtimer() will call j1939_session_deactivate_activate_next() to deactivate current session, and rxtimer won't be set. But for multipacket broadcast session, __j1939_session_cancel() don't set session->state = J1939_SESSION_WAITING_ABORT, thus current session won't be deactivate and hrtimer_start() is called to start new rxtimer again and again. So fix it by moving session->state = J1939_SESSION_WAITING_ABORT out of if (!j1939_cb_is_broadcast(&session->skcb)) statement. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-4-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-08-15can: j1939: cancel rxtimer on multipacket broadcast session completeZhang Changzhong1-0/+1
If j1939_xtp_rx_dat_one() receive last frame of multipacket broadcast message, j1939_session_timers_cancel() should be called to cancel rxtimer. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-3-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-08-15can: j1939: fix support for multipacket broadcast messageZhang Changzhong1-3/+14
Currently j1939_tp_im_involved_anydir() in j1939_tp_recv() check the previously set flags J1939_ECU_LOCAL_DST and J1939_ECU_LOCAL_SRC of incoming skb, thus multipacket broadcast message was aborted by receive side because it may come from remote ECUs and have no exact dst address. Similarly, j1939_tp_cmd_recv() and j1939_xtp_rx_dat() didn't process broadcast message. So fix it by checking and process broadcast message in j1939_tp_recv(), j1939_tp_cmd_recv() and j1939_xtp_rx_dat(). Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-2-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-08-14net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs'Lee Jones1-4/+0
This variable is present in many source files and has not been used anywhere (at least internally) since it was introduced. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/cfm.c: In function ‘cfm’: drivers/net/fddi/skfp/cfm.c:211:6: warning: variable ‘oldstate’ set but not used [-Wunused-but-set-variable] drivers/net/fddi/skfp/cfm.c:40:19: warning: ‘ID_sccs’ defined but not used [-Wunused-const-variable=] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: fddi: skfp: cfm: Remove set but unused variable 'oldstate'Lee Jones1-11/+2
While we're at it, remove some code which has never been invoked. Keep the comment though, as it seems potentially half useful. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/cfm.c: In function ‘cfm’: drivers/net/fddi/skfp/cfm.c:211:6: warning: variable ‘oldstate’ set but not used [-Wunused-but-set-variable] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs'Lee Jones1-4/+0
This variable is present in many source files and has not been used anywhere (at least internally) since it was introduced. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/smt.c:24:19: warning: ‘ID_sccs’ defined but not used [-Wunused-const-variable=] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: fddi: skfp: smt: Place definition of 'smt_pdef' under same stipulations as its useLee Jones1-1/+2
The variable 'smt_pdef' is only used if LITTLE_ENDIAN is set, so only define it if this is the case. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/smt.c:1572:3: warning: ‘smt_pdef’ defined but not used [-Wunused-const-variable=] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: fddi: skfp: fplustm: Remove seemingly unused variable 'ID_sccs'Lee Jones1-4/+0
This variable is present in many source files and has not been used anywhere (at least internally) since it was introduced. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/fplustm.c:25:19: warning: ‘ID_sccs’ defined but not used [-Wunused-const-variable=] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: netdev@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: fddi: skfp: hwmtm: Remove seemingly unused variable 'ID_sccs'Lee Jones1-4/+0
This variable is present in many source files and has not been used anywhere (at least internally) since it was introduced. Fixes the following W=1 kernel build warning(s): drivers/net/fddi/skfp/hwmtm.c:14:19: warning: ‘ID_sccs’ defined but not used [-Wunused-const-variable=] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: wan: dlci: Remove set but not used variable 'err'Lee Jones1-2/+1
Fixes the following W=1 kernel build warning(s): drivers/net/wan/dlci.c: In function ‘dlci_close’: drivers/net/wan/dlci.c:298:8: warning: variable ‘err’ set but not used [-Wunused-but-set-variable] Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mike McLagan <mike.mclagan@linux.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: ethernet: 8390: axnet_cs: Document unused parameter 'txqueue'Lee Jones1-0/+1
Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/8390/axnet_cs.c:907: warning: Function parameter or member 'txqueue' not described in 'axnet_tx_timeout' Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Martin Habets <mhabets@solarflare.com> Cc: Shannon Nelson <snelson@pensando.io> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: William Lee <william@asix.com.tw> Cc: "A. Hinds --" <dahinds@users.sourceforge.net> Cc: reached at <becker@scyld.com> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: bonding: bond_alb: Describe alb_handle_addr_collision_on_attach()'s 'bond' and 'addr' paramsLee Jones1-2/+2
Fixes the following W=1 kernel build warning(s): drivers/net/bonding/bond_alb.c:1222: warning: Function parameter or member 'bond' not described in 'alb_set_mac_address' Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: ethernet: 3com: 3c574_cs: Remove set but unused variables 'tx' and 'rx'Lee Jones1-3/+3
Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/3com/3c574_cs.c: In function ‘update_stats’: drivers/net/ethernet/3com/3c574_cs.c:954:9: warning: variable ‘tx’ set but not used [-Wunused-but-set-variable] 954 | u8 rx, tx, up; | ^~ drivers/net/ethernet/3com/3c574_cs.c:954:5: warning: variable ‘rx’ set but not used [-Wunused-but-set-variable] 954 | u8 rx, tx, up; | ^~ Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shannon Nelson <snelson@pensando.io> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Martin Habets <mhabets@solarflare.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Donald Becker <becker@scyld.com> Cc: David Hinds <dahinds@users.sourceforge.net> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>