aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-06-24mm/hmm: Poison hmm_range during unregisterJason Gunthorpe1-6/+8
Trying to misuse a range outside its lifetime is a kernel bug. Use poison bytes to help detect this condition. Double unregister will reliably crash. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-24mm/hmm: Remove racy protection against double-unregistrationJason Gunthorpe1-7/+1
No other register/unregister kernel API attempts to provide this kind of protection as it is inherently racy, so just drop it. Callers should provide their own protection, and it appears nouveau already does. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18mm/hmm: Use lockdep instead of commentsJason Gunthorpe1-2/+2
So we can check locking at runtime. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18mm/hmm: Hold on to the mmget for the lifetime of the rangeJason Gunthorpe2-47/+11
Range functions like hmm_range_snapshot() and hmm_range_fault() call find_vma, which requires hodling the mmget() and the mmap_sem for the mm. Make this simpler for the callers by holding the mmget() inside the range for the lifetime of the range. Other functions that accept a range should only be called if the range is registered. This has the side effect of directly preventing hmm_release() from happening while a range is registered. That means range->dead cannot be false during the lifetime of the range, so remove dead and hmm_mirror_mm_is_alive() entirely. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18mm/hmm: Do not use list*_rcu() for hmm->rangesJason Gunthorpe1-2/+2
This list is always read and written while holding hmm->lock so there is no need for the confusing _rcu annotations. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <iweiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18mm/hmm: Remove duplicate condition test before wait_event_timeoutJason Gunthorpe1-11/+2
The wait_event_timeout macro already tests the condition as its first action, so there is no reason to open code another version of this, all that does is skip the might_sleep() debugging in common cases, which is not helpful. Further, based on prior patches, we can now simplify the required condition test: - If range is valid memory then so is range->hmm - If hmm_release() has run then range->valid is set to false at the same time as dead, so no reason to check both. - A valid hmm has a valid hmm->mm. Allowing the return value of wait_event_timeout() (along with its internal barriers) to compute the result of the function. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-18mm/hmm: Simplify hmm_get_or_create and make it reliableJason Gunthorpe1-47/+30
As coded this function can false-fail in various racy situations. Make it reliable and simpler by running under the write side of the mmap_sem and avoiding the false-failing compare/exchange pattern. Due to the mmap_sem this no longer has to avoid racing with a 2nd parallel hmm_get_or_create(). Unfortunately this still has to use the page_table_lock as the non-sleeping lock protecting mm->hmm, since the contexts where we free the hmm are incompatible with mmap_sem. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-10mm/hmm: Hold a mmgrab from hmm to mmJason Gunthorpe3-22/+4
So long as a struct hmm pointer exists, so should the struct mm it is linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it once the hmm refcount goes to zero. Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL mm->hmm delete the hmm_hmm_destroy(). Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-10mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_registerJason Gunthorpe3-13/+9
Ralph observes that hmm_range_register() can only be called by a driver while a mirror is registered. Make this clear in the API by passing in the mirror structure as a parameter. This also simplifies understanding the lifetime model for struct hmm, as the hmm pointer must be valid as part of a registered mirror so all we need in hmm_register_range() is a simple kref_get. Suggested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-07mm/hmm: fix use after free with struct hmm in the mmu notifiersJason Gunthorpe2-6/+18
mmu_notifier_unregister_no_release() is not a fence and the mmu_notifier system will continue to reference hmm->mn until the srcu grace period expires. Resulting in use after free races like this: CPU0 CPU1 __mmu_notifier_invalidate_range_start() srcu_read_lock hlist_for_each () // mn == hmm->mn hmm_mirror_unregister() hmm_put() hmm_free() mmu_notifier_unregister_no_release() hlist_del_init_rcu(hmm-mn->list) mn->ops->invalidate_range_start(mn, range); mm_get_hmm() mm->hmm = NULL; kfree(hmm) mutex_lock(&hmm->lock); Use SRCU to kfree the hmm memory so that the notifiers can rely on hmm existing. Get the now-safe hmm struct through container_of and directly check kref_get_unless_zero to lock it against free. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
2019-06-06mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blockingKuehling, Felix1-1/+1
Don't set this flag by default in hmm_vma_do_fault. It is set conditionally just a few lines below. Setting it unconditionally can lead to handle_mm_fault doing a non-blocking fault, returning -EBUSY and unlocking mmap_sem unexpectedly. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-06mm/hmm: support automatic NUMA balancingPhilip Yang1-1/+1
While the page is migrating by NUMA balancing, HMM failed to detect this condition and still return the old page. Application will use the new page migrated, but driver pass the old page physical address to GPU, this crash the application later. Use pte_protnone(pte) to return this condition and then hmm_vma_do_fault will allocate new page. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-06mm/hmm: clean up some coding style and commentsRalph Campbell2-65/+68
There are no functional changes, just some coding style clean ups and minor comment changes. Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-06mm/hmm: update HMM documentationRalph Campbell2-70/+78
Update the HMM documentation to reflect the latest API and make a few minor wording changes. Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-06mm/hmm.c: suppress compilation warnings when CONFIG_HUGETLB_PAGE is not setJason Gunthorpe1-7/+2
gcc reports that several variables are defined but not used. For the first hunk CONFIG_HUGETLB_PAGE the entire if block is already protected by pud_huge() which is forced to 0. None of the stuff under the ifdef causes compilation problems as it is already stubbed out in the header files. For the second hunk the dummy huge_page_shift macro doesn't touch the argument, so just inline the argument. Link: http://lkml.kernel.org/r/20190522195151.GA23955@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-06-02Linux 5.2-rc3Linus Torvalds1-1/+1
2019-06-01mm, compaction: make sure we isolate a valid PFNSuzuki K Poulose1-1/+1
When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reported-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01include/linux/generic-radix-tree.h: fix kerneldoc commentJonathan Corbet1-1/+1
The DOC comment block section in include/linux/generic-radix-tree.h contained a spurious colon, causing this warning in the documentation build: include/linux/generic-radix-tree.h:1: warning: no structured comments found Remove the colon and make the docs build happy. Link: http://lkml.kernel.org/r/20190524141933.74ae9050@lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01kernel/signal.c: trace_signal_deliver when signal_group_exitZhenliang Wei1-0/+2
In the fixes commit, removing SIGKILL from each thread signal mask and executing "goto fatal" directly will skip the call to "trace_signal_deliver". At this point, the delivery tracking of the SIGKILL signal will be inaccurate. Therefore, we need to add trace_signal_deliver before "goto fatal" after executing sigdelset. Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info. Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT") Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com> Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ivan Delalande <colona@arista.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not usedQian Cai1-1/+2
Commit cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt") added for_each_active_iommu() in iommu_prepare_static_identity_mapping() but never used the each element, i.e, "drhd->iommu". drivers/iommu/intel-iommu.c: In function 'iommu_prepare_static_identity_mapping': drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but not used [-Wunused-but-set-variable] struct intel_iommu *iommu; Fixed the warning by appending a compiler attribute __maybe_unused for it. Link: http://lkml.kernel.org/r/20190523013314.2732-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01spdxcheck.py: fix directory structuresVincenzo Frascino1-3/+4
The LICENSE directory has recently changed structure and this makes spdxcheck fails as per below: FAIL: "Blob or Tree named 'other' not found" Traceback (most recent call last): File "scripts/spdxcheck.py", line 240, in <module> spdx = read_spdxdata(repo) File "scripts/spdxcheck.py", line 41, in read_spdxdata for el in lictree[d].traverse(): [...] KeyError: "Blob or Tree named 'other' not found" Fix the script to restore the correctness on checkpatch License checking. References: 62be257e986d ("LICENSES: Rename other to deprecated") References: 8ea8814fcdcb ("LICENSES: Clearly mark dual license only licenses") Link: http://lkml.kernel.org/r/20190523084755.56739-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Joe Perches <joe@perches.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Cline <jcline@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01kasan: initialize tag to 0xff in __kasan_kmallocNathan Chancellor1-1/+1
When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang warns: mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when used here [-Wuninitialized] kasan_unpoison_shadow(set_tag(object, tag), size); ^~~ set_tag ignores tag in this configuration but clang doesn't realize it at this point in its pipeline, as it points to arch_kasan_set_tag as being the point where it is used, which will later be expanded to (void *)(object) without a use of tag. Initialize tag to 0xff, as it removes this warning and doesn't change the meaning of the code. Link: https://github.com/ClangBuiltLinux/linux/issues/465 Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01z3fold: fix sheduling while atomicVitaly Wool1-5/+6
kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so we need to pass correct gfp flags to avoid "scheduling while atomic" bug. Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles") Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not setFabiano Rosas1-1/+2
CLK_GET_RATE_NOCACHE depends on CONFIG_COMMON_CLK. Importing constants.py when CONFIG_COMMON_CLK is not defined causes: (gdb) lx-symbols (...) File "scripts/gdb/linux/proc.py", line 15, in <module> from linux import constants File "scripts/gdb/linux/constants.py", line 2, in <module> LX_CLK_GET_RATE_NOCACHE = gdb.parse_and_eval("CLK_GET_RATE_NOCACHE") gdb.error: No symbol "CLK_GET_RATE_NOCACHE" in current context. Link: http://lkml.kernel.org/r/20190523195313.24701-1-farosas@linux.ibm.com Fixes: e7e6f462c1be ("scripts/gdb: print cached rate in lx-clk-summary") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Leonard Crestez <leonard.crestez@nxp.com> Cc: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01mm/gup: continue VM_FAULT_RETRY processing even for pre-faultsMike Rapoport1-7/+8
When get_user_pages*() is called with pages = NULL, the processing of VM_FAULT_RETRY terminates early without actually retrying to fault-in all the pages. If the pages in the requested range belong to a VMA that has userfaultfd registered, handle_userfault() returns VM_FAULT_RETRY *after* user space has populated the page, but for the gup pre-fault case there's no actual retry and the caller will get no pages although they are present. This issue was uncovered when running post-copy memory restore in CRIU after d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails"). After this change, the copying of FPU state to the sigframe switched from copy_to_user() variants which caused a real page fault to get_user_pages() with pages parameter set to NULL. In post-copy mode of CRIU, the destination memory is managed with userfaultfd and lack of the retry for pre-fault case in get_user_pages() causes a crash of the restored process. Making the pre-fault behavior of get_user_pages() the same as the "normal" one fixes the issue. Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com Fixes: d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Andrei Vagin <avagin@gmail.com> [https://travis-ci.org/avagin/linux/builds/533184940] Tested-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01ocfs2: fix error path kobject memory leakTobin C. Harding1-0/+1
If a call to kobject_init_and_add() fails we should call kobject_put() otherwise we leak memory. Add call to kobject_put() in the error path of call to kobject_init_and_add(). Please note, this has the side effect that the release method is called if kobject_init_and_add() fails. Link: http://lkml.kernel.org/r/20190513033458.2824-1-tobin@kernel.org Signed-off-by: Tobin C. Harding <tobin@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01memcg: make it work on sparse non-0-node systemsJiri Slaby2-5/+4
We have a single node system with node 0 disabled: Scanning NUMA topology in Northbridge 24 Number of physical nodes 2 Skipping disabled node 0 Node 1 MemBase 0000000000000000 Limit 00000000fbff0000 NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff] This causes crashes in memcg when system boots: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] ... RIP: 0010:list_lru_add+0x94/0x170 ... Call Trace: d_lru_add+0x44/0x50 dput.part.34+0xfc/0x110 __fput+0x108/0x230 task_work_run+0x9f/0xc0 exit_to_usermode_loop+0xf5/0x100 It is reproducible as far as 4.12. I did not try older kernels. You have to have a new enough systemd, e.g. 241 (the reason is unknown -- was not investigated). Cannot be reproduced with systemd 234. The system crashes because the size of lru array is never updated in memcg_update_all_list_lrus and the reads are past the zero-sized array, causing dereferences of random memory. The root cause are list_lru_memcg_aware checks in the list_lru code. The test in list_lru_memcg_aware is broken: it assumes node 0 is always present, but it is not true on some systems as can be seen above. So fix this by avoiding checks on node 0. Remember the memcg-awareness by a bool flag in struct list_lru. Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01mm, memcg: consider subtrees in memory.eventsChris Down4-4/+36
memory.stat and other files already consider subtrees in their output, and we should too in order to not present an inconsistent interface. The current situation is fairly confusing, because people interacting with cgroups expect hierarchical behaviour in the vein of memory.stat, cgroup.events, and other files. For example, this causes confusion when debugging reclaim events under low, as currently these always read "0" at non-leaf memcg nodes, which frequently causes people to misdiagnose breach behaviour. The same confusion applies to other counters in this file when debugging issues. Aggregation is done at write time instead of at read-time since these counters aren't hot (unlike memory.stat which is per-page, so it does it at read time), and it makes sense to bundle this with the file notifications. After this patch, events are propagated up the hierarchy: [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events low 0 high 0 max 0 oom 0 oom_kill 0 [root@ktst ~]# systemd-run -p MemoryMax=1 true Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events low 0 high 0 max 7 oom 1 oom_kill 1 As this is a change in behaviour, this can be reverted to the old behaviour by mounting with the `memory_localevents' flag set. However, we use the new behaviour by default as there's a lack of evidence that there are any current users of memory.events that would find this change undesirable. akpm: this is a behaviour change, so Cc:stable. THis is so that forthcoming distros which use cgroup v2 are more likely to pick up the revised behaviour. Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01prctl_set_mm: downgrade mmap_sem to read lockMichal Koutný2-4/+11
The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap semaphore taken.") added synchronization of reading argument/environment boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided the coarse use of mmap_sem in similar situations. But there still remained two places that (mis)use mmap_sem. get_cmdline should also use arg_lock instead of mmap_sem when it reads the boundaries. The second place that should use arg_lock is in prctl_set_mm. By protecting the boundaries fields with the arg_lock, we can downgrade mmap_sem to reader lock (analogous to what we already do in prctl_set_mm_map). [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Co-developed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01prctl_set_mm: refactor checks from validate_prctl_mapMichal Koutný1-26/+25
Despite comment of validate_prctl_map claims there are no capability checks, it is not completely true since commit 4d28df6152aa ("prctl: Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of the function and make the function perform purely arithmetic checks. This patch should not change any behavior, it is mere refactoring for following patch. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-2-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01kernel/fork.c: make max_threads symbol staticKefeng Wang1-1/+1
Fix build warning, kernel/fork.c:125:5: warning: symbol 'max_threads' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20190516015118.140561-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changesSebastian Andrzej Siewior1-0/+1
include/linux/cpumask.h: In function 'cpumask_parse': include/linux/cpumask.h:636:21: error: implicit declaration of function 'strchrnul'; did you mean 'strchr'? [-Werror=implicit-function-declaration] Because arch/arm/boot/compressed/decompress.c does #define _LINUX_STRING_H_ preventing linux/string.h from providing strchrnul. It also #includes asm/string.h, which for arm has a declaration of strchr(), explaining why this didn't use to fail. Link: http://lkml.kernel.org/r/20190528115346.f5a7kn3hdnuf5rts@linutronix.de Fixes: 3713a4e1fdb8d ("include/linux/cpumask.h: fix double string traverse in cpumask_parse") Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yury Norov <ynorov@marvell.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAKDavid Rientjes1-1/+0
CONFIG_DEBUG_SLAB_LEAK has been removed, so remove it from defconfig. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1905201015460.96074@chino.kir.corp.google.com Fixes: 7878c231dae0 ("slab: remove /proc/slab_allocators") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01mm/vmalloc.c: fix typo in commentAndrew Morton1-1/+1
Reported-by: Nicholas Joll <najoll@posteo.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01lib/sort.c: fix kernel-doc notation warningsRandy Dunlap1-6/+9
Fix kernel-doc notation in lib/sort.c by using correct function parameter names. lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32' lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64' lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes' Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: George Spelvin <lkml@sdf.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01mm: fix Documentation/vm/hmm.rst Sphinx warningsRandy Dunlap1-3/+5
Fix Sphinx warnings in Documentation/vm/hmm.rst by using "::" notation and inserting a blank line. Also add a missing ';'. Documentation/vm/hmm.rst:292: WARNING: Unexpected indentation. Documentation/vm/hmm.rst:300: WARNING: Unexpected indentation. Link: http://lkml.kernel.org/r/c5995359-7c82-4e47-c7be-b58a4dda0953@infradead.org Fixes: 023a019a9b4e ("mm/hmm: add default fault flags to avoid the need to pre-fill pfns arrays") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01treewide: fix typos of SPDX-License-IdentifierMasahiro Yamada5-5/+5
Prior to the adoption of SPDX, it was difficult for tools to determine the correct license due to incomplete or badly formatted license text. The SPDX solves this issue, assuming people can correctly spell "SPDX-License-Identifier" although this assumption is broken in some places. Since scripts/spdxcheck.py parses only lines that exactly matches to the correct tag, it cannot (should not) detect this kind of error. If the correct tag is missing, scripts/checkpatch.pl warns like this: WARNING: Missing or malformed SPDX-License-Identifier tag in line * So, people should notice it before the patch submission, but in reality broken tags sometimes slip in. The checkpatch warning is not useful for checking the committed files globally since large number of files still have no SPDX tag. Also, I am not sure about the legal effect when the SPDX tag is broken. Anyway, these typos are absolutely worth fixing. It is pretty easy to find suspicious lines by grep. $ git grep --not -e SPDX-License-Identifier --and -e SPDX- -- \ :^LICENSES :^scripts/spdxcheck.py :^*/license-rules.rst arch/arm/kernel/bugs.c:// SPDX-Identifier: GPL-2.0 drivers/phy/st/phy-stm32-usbphyc.c:// SPDX-Licence-Identifier: GPL-2.0 drivers/pinctrl/sh-pfc/pfc-r8a77980.c:// SPDX-Lincense-Identifier: GPL 2.0 lib/test_stackinit.c:// SPDX-Licenses: GPLv2 sound/soc/codecs/max9759.c:// SPDX-Licence-Identifier: GPL-2.0 Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-01crypto: ux500 - fix license comment syntax errorAlex Xu (Hello71)1-1/+1
Causes error: drivers/crypto/ux500/cryp/Makefile:5: *** missing separator. Stop. Fixes: af873fcecef5 ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 194") Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-01MAINTAINERS: add I2C DT bindings to ARM platformsWolfram Sang1-0/+11
Reviewed-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-06-01MAINTAINERS: add DT bindings to i2c driversWolfram Sang1-0/+7
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-05-31block: print offending values when cloned rq limits are exceededJohn Pittman1-2/+5
While troubleshooting issues where cloned request limits have been exceeded, it is often beneficial to know the actual values that have been breached. Print these values, assisting in ease of identification of root cause of the breach. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31blk-mq: Document the blk_mq_hw_queue_to_node() argumentsBart Van Assche1-1/+5
Document the meaning of the blk_mq_hw_queue_to_node() arguments. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31blk-mq: Fix spelling in a source code commentBart Van Assche1-2/+2
Change one occurrence of 'performace' into 'performance'. Cc: Max Gurtovoy <maxg@mellanox.com> Fixes: fe631457ff3e ("blk-mq: map all HWQ also in hyperthreaded system") # v4.13. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block: Fix bsg_setup_queue() kernel-doc headerBart Van Assche1-0/+1
Document all bsg_setup_queue() arguments as required. Fixes: aae3b069d5ce ("bsg: pass in desired timeout handler") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block: Fix rq_qos_wait() kernel-doc headerBart Van Assche1-3/+4
Add documentation for the @rqw argument and change " - " into ": ". Fixes: 84f603246db9 ("block: add rq_qos_wait to rq_qos") # v5.0-rc1~52^2~140. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block: Fix blk_mq_*_map_queues() kernel-doc headersBart Van Assche3-5/+5
This patch avoids that the kernel-doc script complains about these function headers when building with W=1. Cc: Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Fixes: ed76e329d74a ("blk-mq: abstract out queue map") # v5.0. Fixes: e42b3867de4b ("blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block: Fix throtl_pending_timer_fn() kernel-doc headerBart Van Assche1-1/+1
Commit e99e88a9d2b0 renamed a function argument without updating the corresponding kernel-doc header. Update the kernel-doc header. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Reviewed-by: Kees Cook <keescook@chromium.org> Fixes: e99e88a9d2b0 ("treewide: setup_timer() -> timer_setup()") # v4.15. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block: Convert blk_invalidate_devt() header into a non-kernel-doc headerBart Van Assche1-2/+2
This patch avoids that the kernel-doc tool warns about this function header when building with W=1. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31block/partitions/ldm: Convert a kernel-doc header into a non-kernel-doc headerBart Van Assche1-1/+1
This patch avoids that the kernel-doc tool warns about this function header when building with W=1. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31leds: avoid flush_work in atomic contextPavel Machek2-5/+5
It turns out that various triggers use led_blink_setup() from atomic context, so we can't do a flush_work there. Flush is still needed for slow LEDs, but we can move it to sysfs code where it is safe. WARNING: inconsistent lock state 5.2.0-rc1 #1 Tainted: G W -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: 000000006e30541b ((work_completion)(&led_cdev->set_brightness_work)){+.?.}, at: +__flush_work+0x3b/0x38a {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x146/0x1a1 __flush_work+0x5b/0x38a flush_work+0xb/0xd led_blink_setup+0x1e/0xd3 led_blink_set+0x3f/0x44 tpt_trig_timer+0xdb/0x106 ieee80211_mod_tpt_led_trig+0xed/0x112 Fixes: 0db37915d912 ("leds: avoid races with workqueue") Signed-off-by: Pavel Machek <pavel@ucw.cz> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>