From 7d8faaf155454f8798ec56404faca29a82689c77 Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Wed, 6 Jul 2022 16:59:27 -0700 Subject: mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc [jrdr.linux@gmail.com: avoid possible memory leak in failure path] Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com [zokeefe@google.com add missing kfree() to madvise_collapse()] Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/ Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com [zokeefe@google.com: delay computation of hpage boundaries until use]] Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com Signed-off-by: Zach O'Keefe Signed-off-by: "Souptick Joarder (HPE)" Suggested-by: David Rientjes Cc: Alex Shi Cc: Andrea Arcangeli Cc: Arnd Bergmann Cc: Axel Rasmussen Cc: Chris Kennelly Cc: Chris Zankel Cc: David Hildenbrand Cc: Helge Deller Cc: Hugh Dickins Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Jens Axboe Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Miaohe Lin Cc: Michal Hocko Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavel Begunkov Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Thomas Bogendoerfer Cc: Vlastimil Babka Cc: Yang Shi Cc: Zi Yan Cc: Dan Carpenter Signed-off-by: Andrew Morton --- arch/xtensa/include/uapi/asm/mman.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/xtensa') diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 7966a58af472..1ff0c858544f 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -111,6 +111,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 -- cgit v1.2.3-59-g8ed1b From 0192445cb2f7ed1cd7a95a0fc8c7645480baba25 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Mon, 15 Aug 2022 10:39:59 -0400 Subject: arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER This Kconfig option is used by individual arch to set its desired MAX_ORDER. Rename it to reflect its actual use. Link: https://lkml.kernel.org/r/20220815143959.1511278-1-zi.yan@sent.com Acked-by: Mike Rapoport Signed-off-by: Zi Yan Acked-by: Guo Ren [csky] Acked-by: Arnd Bergmann Acked-by: Catalin Marinas [arm64] Acked-by: Huacai Chen [LoongArch] Acked-by: Michael Ellerman [powerpc] Cc: Vineet Gupta Cc: Taichi Sugaya Cc: Neil Armstrong Cc: Qin Jian Cc: Guo Ren Cc: Geert Uytterhoeven Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Christophe Leroy Cc: Yoshinori Sato Cc: "David S. Miller" Cc: Chris Zankel Cc: Ley Foon Tan Signed-off-by: Andrew Morton --- arch/arc/Kconfig | 2 +- arch/arm/Kconfig | 2 +- arch/arm/configs/imx_v6_v7_defconfig | 2 +- arch/arm/configs/milbeaut_m10v_defconfig | 2 +- arch/arm/configs/oxnas_v6_defconfig | 2 +- arch/arm/configs/pxa_defconfig | 2 +- arch/arm/configs/sama7_defconfig | 2 +- arch/arm/configs/sp7021_defconfig | 2 +- arch/arm64/Kconfig | 2 +- arch/csky/Kconfig | 2 +- arch/ia64/Kconfig | 2 +- arch/ia64/include/asm/sparsemem.h | 6 +++--- arch/loongarch/Kconfig | 2 +- arch/m68k/Kconfig.cpu | 2 +- arch/mips/Kconfig | 2 +- arch/nios2/Kconfig | 2 +- arch/powerpc/Kconfig | 2 +- arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +- arch/powerpc/configs/fsl-emb-nonhw.config | 2 +- arch/sh/configs/ecovec24_defconfig | 2 +- arch/sh/mm/Kconfig | 2 +- arch/sparc/Kconfig | 2 +- arch/xtensa/Kconfig | 2 +- include/linux/mmzone.h | 4 ++-- 24 files changed, 27 insertions(+), 27 deletions(-) (limited to 'arch/xtensa') diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 9e3653253ef2..d9a13ccf89a3 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -554,7 +554,7 @@ config ARC_BUILTIN_DTB_NAME endmenu # "ARC Architecture Configuration" -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" default "12" if ARC_HUGEPAGE_16M default "11" diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 87badeae3181..e6c8ee56ac52 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1434,7 +1434,7 @@ config ARM_MODULE_PLTS Disabling this is usually safe for small single-platform configurations. If unsure, say y. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" default "12" if SOC_AM33XX default "9" if SA1111 diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index 01012537a9b9..fb283059daa0 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y CONFIG_SMP=y CONFIG_ARM_PSCI=y CONFIG_HIGHMEM=y -CONFIG_FORCE_MAX_ZONEORDER=14 +CONFIG_ARCH_FORCE_MAX_ORDER=14 CONFIG_CMDLINE="noinitrd console=ttymxc0,115200" CONFIG_KEXEC=y CONFIG_CPU_FREQ=y diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig index 58810e98de3d..8620061e19a8 100644 --- a/arch/arm/configs/milbeaut_m10v_defconfig +++ b/arch/arm/configs/milbeaut_m10v_defconfig @@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set # CONFIG_ARM_PATCH_IDIV is not set CONFIG_HIGHMEM=y -CONFIG_FORCE_MAX_ZONEORDER=12 +CONFIG_ARCH_FORCE_MAX_ORDER=12 CONFIG_SECCOMP=y CONFIG_KEXEC=y CONFIG_EFI=y diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig index 600f78b363dd..5c163a9d1429 100644 --- a/arch/arm/configs/oxnas_v6_defconfig +++ b/arch/arm/configs/oxnas_v6_defconfig @@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y CONFIG_MACH_OX820=y CONFIG_SMP=y CONFIG_NR_CPUS=16 -CONFIG_FORCE_MAX_ZONEORDER=12 +CONFIG_ARCH_FORCE_MAX_ORDER=12 CONFIG_SECCOMP=y CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig index 104a45722799..ce3f4ed50498 100644 --- a/arch/arm/configs/pxa_defconfig +++ b/arch/arm/configs/pxa_defconfig @@ -21,7 +21,7 @@ CONFIG_MACH_AKITA=y CONFIG_MACH_BORZOI=y CONFIG_PXA_SYSTEMS_CPLDS=y CONFIG_AEABI=y -CONFIG_FORCE_MAX_ZONEORDER=9 +CONFIG_ARCH_FORCE_MAX_ORDER=9 CONFIG_CMDLINE="root=/dev/ram0 ro" CONFIG_KEXEC=y CONFIG_CPU_FREQ=y diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig index 0384030d8b25..8b2cf6ddd568 100644 --- a/arch/arm/configs/sama7_defconfig +++ b/arch/arm/configs/sama7_defconfig @@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y # CONFIG_CACHE_L2X0 is not set # CONFIG_ARM_PATCH_IDIV is not set # CONFIG_CPU_SW_DOMAIN_PAN is not set -CONFIG_FORCE_MAX_ZONEORDER=15 +CONFIG_ARCH_FORCE_MAX_ORDER=15 CONFIG_UACCESS_WITH_MEMCPY=y # CONFIG_ATAGS is not set CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel" diff --git a/arch/arm/configs/sp7021_defconfig b/arch/arm/configs/sp7021_defconfig index 703b9aaa40f0..151ca8c47373 100644 --- a/arch/arm/configs/sp7021_defconfig +++ b/arch/arm/configs/sp7021_defconfig @@ -18,7 +18,7 @@ CONFIG_ARCH_SUNPLUS=y # CONFIG_VDSO is not set CONFIG_SMP=y CONFIG_THUMB2_KERNEL=y -CONFIG_FORCE_MAX_ZONEORDER=12 +CONFIG_ARCH_FORCE_MAX_ORDER=12 CONFIG_VFP=y CONFIG_NEON=y CONFIG_MODULES=y diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 9fb9fff08c94..c5c7d812704c 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1418,7 +1418,7 @@ config XEN help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int default "14" if ARM64_64K_PAGES default "12" if ARM64_16K_PAGES diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 3cbc2dc62baf..adee6ab36862 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -332,7 +332,7 @@ config HIGHMEM select KMAP_LOCAL default y -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" default "11" diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 26ac8ea15a9e..c6e06cdc738f 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -200,7 +200,7 @@ config IA64_CYCLONE Say Y here to enable support for IBM EXA Cyclone time source. If you're unsure, answer N. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "MAX_ORDER (11 - 17)" if !HUGETLB_PAGE range 11 17 if !HUGETLB_PAGE default "17" if HUGETLB_PAGE diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h index 42ed5248fae9..84e8ce387b69 100644 --- a/arch/ia64/include/asm/sparsemem.h +++ b/arch/ia64/include/asm/sparsemem.h @@ -11,10 +11,10 @@ #define SECTION_SIZE_BITS (30) #define MAX_PHYSMEM_BITS (50) -#ifdef CONFIG_FORCE_MAX_ZONEORDER -#if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS) +#ifdef CONFIG_ARCH_FORCE_MAX_ORDER +#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS) #undef SECTION_SIZE_BITS -#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) +#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) #endif #endif diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 26aeb1408e56..3c7a5a54b808 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -370,7 +370,7 @@ config NODES_SHIFT default "6" depends on NUMA -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" range 14 64 if PAGE_SIZE_64KB default "14" if PAGE_SIZE_64KB diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu index e0e9e31339c1..3b2f39508524 100644 --- a/arch/m68k/Kconfig.cpu +++ b/arch/m68k/Kconfig.cpu @@ -399,7 +399,7 @@ config SINGLE_MEMORY_CHUNK order" to save memory that could be wasted for unused memory map. Say N if not sure. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" if ADVANCED depends on !SINGLE_MEMORY_CHUNK default "11" diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index ec21f8999249..70d28976a40d 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2140,7 +2140,7 @@ config PAGE_SIZE_64KB endchoice -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig index 4167f1eb4cd8..a582f72104f3 100644 --- a/arch/nios2/Kconfig +++ b/arch/nios2/Kconfig @@ -44,7 +44,7 @@ menu "Kernel features" source "kernel/Kconfig.hz" -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" range 9 20 default "11" diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4c466acdc70d..39d71d7701bd 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -845,7 +845,7 @@ config DATA_SHIFT in that case. If PIN_TLB is selected, it must be aligned to 8M as 8M pages will be pinned. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" range 8 9 if PPC64 && PPC_64K_PAGES default "9" if PPC64 && PPC_64K_PAGES diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig index f29c166998af..e7672c186325 100644 --- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig +++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig @@ -30,7 +30,7 @@ CONFIG_PREEMPT=y # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set CONFIG_BINFMT_MISC=m CONFIG_MATH_EMULATION=y -CONFIG_FORCE_MAX_ZONEORDER=17 +CONFIG_ARCH_FORCE_MAX_ORDER=17 CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config index f14c6dbd7346..ab8a8c4530d9 100644 --- a/arch/powerpc/configs/fsl-emb-nonhw.config +++ b/arch/powerpc/configs/fsl-emb-nonhw.config @@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y CONFIG_FONT_8x16=y CONFIG_FONT_8x8=y CONFIG_FONTS=y -CONFIG_FORCE_MAX_ZONEORDER=13 +CONFIG_ARCH_FORCE_MAX_ORDER=13 CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_FRAME_WARN=1024 CONFIG_FTL=y diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig index e699e2e04128..b52e14ccb450 100644 --- a/arch/sh/configs/ecovec24_defconfig +++ b/arch/sh/configs/ecovec24_defconfig @@ -8,7 +8,7 @@ CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_BLK_DEV_BSG is not set CONFIG_CPU_SUBTYPE_SH7724=y -CONFIG_FORCE_MAX_ZONEORDER=12 +CONFIG_ARCH_FORCE_MAX_ORDER=12 CONFIG_MEMORY_SIZE=0x10000000 CONFIG_FLATMEM_MANUAL=y CONFIG_SH_ECOVEC=y diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig index ba569cfb4368..411fdc0901f7 100644 --- a/arch/sh/mm/Kconfig +++ b/arch/sh/mm/Kconfig @@ -18,7 +18,7 @@ config PAGE_OFFSET default "0x80000000" if MMU default "0x00000000" -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" range 9 64 if PAGE_SIZE_16KB default "9" if PAGE_SIZE_16KB diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 1c852bb530ec..4d3d1af90d52 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -269,7 +269,7 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" default "13" help diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index 12ac277282ba..bcb0c5d2abc2 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -771,7 +771,7 @@ config HIGHMEM If unsure, say Y. -config FORCE_MAX_ZONEORDER +config ARCH_FORCE_MAX_ORDER int "Maximum zone order" default "11" help diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 025754b0bc09..fd61347b4b1f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -24,10 +24,10 @@ #include /* Free memory management - zoned buddy allocator. */ -#ifndef CONFIG_FORCE_MAX_ZONEORDER +#ifndef CONFIG_ARCH_FORCE_MAX_ORDER #define MAX_ORDER 11 #else -#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER +#define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER #endif #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1)) -- cgit v1.2.3-59-g8ed1b From 49c40fb4b826c90036f04abf583bb4cb5ba3d203 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 6 Sep 2022 19:48:55 +0000 Subject: xtensa: remove vma linked list walks Use the VMA iterator instead. Since VMA can no longer be NULL in the loop, then deal with out-of-memory outside the loop. This means a slightly longer run time in the failure case (-ENOMEM) - it will run to the end of the VMAs before erroring instead of in the middle of the loop. Link: https://lkml.kernel.org/r/20220906194824.2110408-37-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Liam R. Howlett Reviewed-by: Davidlohr Bueso Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/xtensa/kernel/syscall.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'arch/xtensa') diff --git a/arch/xtensa/kernel/syscall.c b/arch/xtensa/kernel/syscall.c index 201356faa7e6..b3c2450d6f23 100644 --- a/arch/xtensa/kernel/syscall.c +++ b/arch/xtensa/kernel/syscall.c @@ -58,6 +58,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct vm_area_struct *vmm; + struct vma_iterator vmi; if (flags & MAP_FIXED) { /* We do not accept a shared mapping if it would violate @@ -79,15 +80,20 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, else addr = PAGE_ALIGN(addr); - for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) { - /* At this point: (!vmm || addr < vmm->vm_end). */ - if (TASK_SIZE - len < addr) - return -ENOMEM; - if (!vmm || addr + len <= vm_start_gap(vmm)) - return addr; + vma_iter_init(&vmi, current->mm, addr); + for_each_vma(vmi, vmm) { + /* At this point: (addr < vmm->vm_end). */ + if (addr + len <= vm_start_gap(vmm)) + break; + addr = vmm->vm_end; if (flags & MAP_SHARED) addr = COLOUR_ALIGN(addr, pgoff); } + + if (TASK_SIZE - len < addr) + return -ENOMEM; + + return addr; } #endif -- cgit v1.2.3-59-g8ed1b