aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms/powernv/memtrace.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-08-13powerpc: rename powerpc_debugfs_root to arch_debugfs_dirAneesh Kumar K.V1-2/+1
No functional change in this patch. arch_debugfs_dir is the generic kernel name declared in linux/debugfs.h for arch-specific debugfs directory. Architectures like x86/s390 already use the name. Rename powerpc specific powerpc_debugfs_root to arch_debugfs_dir. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
2021-05-04powerpc/powernv/memtrace: Fix dcache flushingSandipan Das1-2/+2
Trace memory is cleared and the corresponding dcache lines are flushed after allocation. However, this should not be done using the PFN. This adds the missing conversion to virtual address. Fixes: 2ac02e5ecec0 ("powerpc/mm: Remove dcache flush from memory remove.") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210501160254.1179831-1-sandipan@linux.ibm.com
2021-04-08powerpc/powernv/memtrace: Allow mmaping trace buffersJordan Niethe1-1/+17
Let the memory removed from the linear mapping to be used for the trace buffers be mmaped. This is a useful way of providing cache-inhibited memory for the alignment_handler selftest. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: make memtrace_mmap() static as noticed by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210225032108.1458352-1-jniethe5@gmail.com
2021-02-11powerpc/mm: Remove dcache flush from memory remove.Aneesh Kumar K.V1-0/+29
We added dcache flush on memory add/remove in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") to handle crashes on GPU hotplug. Instead of adding dcache flush in generic memory add/remove routine which is used even for regular memory, we should handle these devices specific flush in the device driver code. memtrace did handle this in the driver and that was removed by commit 7fd6641de28f ("powerpc/powernv/memtrace: Let the arch hotunplug code flush cache"). This patch reverts that commit. The dcache flush in memory add was removed by commit ea458effa88e ("powerpc: Don't flush caches when adding memory") which I don't think is correct. The reason why we require dcache flush in memtrace is to make sure we don't have a dirty cache when we remap a pfn to cache inhibited. We should do that when the memtrace module removes the memory and make the pfn available for HTM traces to map it as cache inhibited. The other device mentioned in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") is nvlink device with coherent memory. The support for that was removed in commit 7eb3cf761927 ("powerpc/powernv: remove unused NPU DMA code") and commit 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-3-aneesh.kumar@linux.ibm.com
2020-11-19powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocationsDavid Hildenbrand1-105/+58
Let's use alloc_contig_pages() for allocating memory and remove the linear mapping manually via arch_remove_linear_mapping(). Mark all pages PG_offline, such that they will definitely not get touched - e.g., when hibernating. When freeing memory, try to revert what we did. The original idea was discussed in: https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com This is similar to CONFIG_DEBUG_PAGEALLOC handling on other architectures, whereby only single pages are unmapped from the linear mapping. Let's mimic what memory hot(un)plug would do with the linear mapping. We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO that we want to use __GFP_ZERO for clearing once alloc_contig_pages() understands that. Tested with in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 145.049019][ T1080] memtrace: Freed trace memory back on node 0 [ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 213.613855][ T1080] memtrace: Freed trace memory back on node 0 [ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages [ 234.886974][ T1080] memtrace: Freed trace memory back on node 0 [ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 I also made sure allocated memory is properly zeroed. Note 1: We currently won't be allocating from ZONE_MOVABLE - because our pages are not movable. However, as we don't run with any memory hot(un)plug mechanism around, we could make an exception to increase the chance of allocations succeeding. Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used along PG_reserved in hibernation code will always return "true" on powerpc, resulting in the pages getting touched. It's too generic - e.g., indicates boot allocations. Note 3: For now, we keep using memory_block_size_bytes() as minimum granularity. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrentlyDavid Hildenbrand1-7/+15
It's very easy to crash the kernel right now by simply trying to enable memtrace concurrently, hammering on the "enable" interface loop.sh: #!/bin/bash dmesg --console-off while true; do echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable done [root@localhost ~]# loop.sh & [root@localhost ~]# loop.sh & Resulting quickly in a kernel crash. Let's properly protect using a mutex. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org# v4.14+ Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Don't leak kernel memory to user spaceDavid Hildenbrand1-0/+22
We currently leak kernel memory to user space, because memory offlining doesn't do any implicit clearing of memory and we are missing explicit clearing of memory. Let's keep it simple and clear pages before removing the linear mapping. Reproduced in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null [... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in 1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900 [ 402.980063][ T1086] flags: 0x7ffff000001000(reserved) [ 402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000 [ 402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 402.980845][ T1086] page dumped because: unmovable page [ 402.989608][ T1086] Offlined Pages 16384 [ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000 Before this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$ After this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 40000000 Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
2020-10-16mm/memory_hotplug: prepare passing flags to add_memory() and friendsDavid Hildenbrand1-1/+1
We soon want to pass flags, e.g., to mark added System RAM resources. mergeable. Prepare for that. This patch is based on a similar patch by Oscar Salvador: https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Juergen Gross <jgross@suse.com> # Xen related part Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Leonardo Bras <leobras.c@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julien Grall <julien@xen.org> Cc: Kees Cook <keescook@chromium.org> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richardw.yang@linux.intel.com> Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07powernv/memtrace: always online added memory blocksDavid Hildenbrand1-10/+4
Let's always try to online the re-added memory blocks. In case add_memory() already onlined the added memory blocks, the first device_online() call will fail and stop processing the remaining memory blocks. This avoids manually having to check memhp_auto_online. Note: PPC always onlines all hotplugged memory directly from the kernel as well - something that is handled by user space on other architectures. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Yumei Huang <yuhuang@redhat.com> Link: http://lkml.kernel.org/r/20200317104942.11178-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-04powerpc/powernv: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-7/+0
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200209105901.1620958-6-gregkh@linuxfoundation.org
2019-07-18mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfnsDavid Hildenbrand1-12/+11
walk_memory_range() was once used to iterate over sections. Now, it iterates over memory blocks. Rename the function, fixup the documentation. Also, pass start+size instead of PFNs, which is what most callers already have at hand. (we'll rework link_mem_sections() most probably soon) Follow-up patches will rework, simplify, and move walk_memory_blocks() to drivers/base/memory.c. Note: walk_memory_blocks() only works correctly right now if the start_pfn is aligned to a section start. This is the case right now, but we'll generalize the function in a follow up patch so the semantics match the documentation. [akpm@linux-foundation.org: remove unused variable] Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Arun KS <arunks@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual1-2/+3
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages()David Hildenbrand1-1/+3
Let's perform all checking + offlining + removing under device_hotplug_lock, so nobody can mess with these devices via sysfs concurrently. [david@redhat.com: take device_hotplug_lock outside of loop] Link: http://lkml.kernel.org/r/20180927092554.13567-6-david@redhat.com Link: http://lkml.kernel.org/r/20180925091457.28651-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31powerpc/powernv: hold device_hotplug_lock when calling device_online()David Hildenbrand1-0/+2
device_online() should be called with device_hotplug_lock() held. Link: http://lkml.kernel.org/r/20180925091457.28651-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31mm/memory_hotplug: make remove_memory() take the device_hotplug_lockDavid Hildenbrand1-1/+1
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3. Reading through the code and studying how mem_hotplug_lock is to be used, I noticed that there are two places where we can end up calling device_online()/device_offline() - online_pages()/offline_pages() without the mem_hotplug_lock. And there are other places where we call device_online()/device_offline() without the device_hotplug_lock. While e.g. echo "online" > /sys/devices/system/memory/memory9/state is fine, e.g. echo 1 > /sys/devices/system/memory/memory9/online Will not take the mem_hotplug_lock. However the device_lock() and device_hotplug_lock. E.g. via memory_probe_store(), we can end up calling add_memory()->online_pages() without the device_hotplug_lock. So we can have concurrent callers in online_pages(). We e.g. touch in online_pages() basically unprotected zone->present_pages then. Looks like there is a longer history to that (see Patch #2 for details), and fixing it to work the way it was intended is not really possible. We would e.g. have to take the mem_hotplug_lock in device/base/core.c, which sounds wrong. Summary: We had a lock inversion on mem_hotplug_lock and device_lock(). More details can be found in patch 3 and patch 6. I propose the general rules (documentation added in patch 6): 1. add_memory/add_memory_resource() must only be called with device_hotplug_lock. 2. remove_memory() must only be called with device_hotplug_lock. This is already documented and holds for all callers. 3. device_online()/device_offline() must only be called with device_hotplug_lock. This is already documented and true for now in core code. Other callers (related to memory hotplug) have to be fixed up. 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/ online_pages/offline_pages. To me, this looks way cleaner than what we have right now (and easier to verify). And looking at the documentation of remove_memory, using lock_device_hotplug also for add_memory() feels natural. This patch (of 6): remove_memory() is exported right now but requires the device_hotplug_lock, which is not exported. So let's provide a variant that takes the lock and only export that one. The lock is already held in arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c arch/powerpc/platforms/powernv/memtrace.c Apart from that, there are not other users in the tree. Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-19powerpc/memtrace: Remove memory in chunksRashmica Gupta1-5/+16
When hot-removing memory release_mem_region_adjustable() splits iomem resources if they are not the exact size of the memory being hot-deleted. Adding this memory back to the kernel adds a new resource. Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from 0xf40000000 results in the single resource 0x0-0xfffffffff being split into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff. When we hot-add the memory back we now have three resources: 0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff. This is an issue if we try to remove some memory that overlaps resources. Eg when trying to remove 2GB at address 0xf40000000, release_mem_region_adjustable() fails as it expects the chunk of memory to be within the boundaries of a single resource. We then get the warning: "Unable to release resource" and attempting to use memtrace again gives us this error: "bash: echo: write error: Resource temporarily unavailable" This patch makes memtrace remove memory in chunks that are always the same size from an address that is always equal to end_of_memory - n*size, for some n. So hotremoving and hotadding memory of different sizes will now not attempt to remove memory that spans multiple resources. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/powernv: Allow memory that has been hot-removed to be hot-addedRashmica Gupta1-7/+85
This patch allows the memory removed by memtrace to be readded to the kernel. So now you don't have to reboot your system to add the memory back to the kernel or to have a different amount of memory removed. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02powerpc/powernv/memtrace: Remove memtrace mmap()Michael Neuling1-29/+0
debugfs doesn't support mmap(), so this code is never used. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14powerpc/powernv: Fix memtrace build when NUMA=nMichael Ellerman1-1/+1
Currently memtrace doesn't build if NUMA=n: In function ‘memtrace_alloc_node’: arch/powerpc/platforms/powernv/memtrace.c:134:6: error: the address of ‘contig_page_data’ will always evaluate as ‘true’ if (!NODE_DATA(nid) || !node_spanned_pages(nid)) ^ This is because for NUMA=n NODE_DATA(nid) points to an always allocated structure, contig_page_data. But even in the NUMA=y case memtrace_alloc_node() is only called for online nodes, and we should always have a NODE_DATA() allocated for an online node. So remove the (hopefully) overly paranoid check, which also means we can build when NUMA=n. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh1-17/+0
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24powerpc/powernv: Enable removal of memory for in memory tracingRashmica Gupta1-0/+282
The hardware trace macro feature requires access to a chunk of real memory. This patch provides a debugfs interface to do this. By writing an integer containing the size of memory to be unplugged into /sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to remove that much memory from the end of each NUMA node. This patch also adds additional debugsfs files for each node that allows the tracer to interact with the removed memory, as well as a trace file that allows userspace to read the generated trace. Note that this patch does not invoke the hardware trace macro, it only allows memory to be removed during runtime for the trace macro to utilise. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> [mpe: Minor formatting etc fixups] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>