Revert "Revert "mm, thp: restore node-local hugepage allocations""

This reverts commit a8282608c88e08b1782141026eab61204c1e533f. The commit references the original intended semantic for MADV_HUGEPAGE which has subsequently taken on three unique purposes: - enables or disables thp for a range of memory depending on the system's config (is thp "enabled" set to "always" or "madvise"), - determines the synchronous compaction behavior for thp allocations at fault (is thp "defrag" set to "always", "defer+madvise", or "madvise"), and - reverts a previous MADV_NOHUGEPAGE (there is no madvise mode to only clear previous hugepage advice). These are the three purposes that currently exist in 5.2 and over the past several years that userspace has been written around. Adding a NUMA locality preference adds a fourth dimension to an already conflated advice mode. Based on the semantic that MADV_HUGEPAGE has provided over the past several years, there exist workloads that use the tunable based on these principles: specifically that the allocation should attempt to defragment a local node before falling back. It is agreed that remote hugepages typically (but not always) have a better access latency than remote native pages, although on Naples this is at parity for intersocket. The revert commit that this patch reverts allows hugepage allocation to immediately allocate remotely when local memory is fragmented. This is contrary to the semantic of MADV_HUGEPAGE over the past several years: that is, memory compaction should be attempted locally before falling back. The performance degradation of remote hugepages over local hugepages on Rome, for example, is 53.5% increased access latency. For this reason, the goal is to revert back to the 5.2 and previous behavior that would attempt local defragmentation before falling back. With the patch that is reverted by this patch, we see performance degradations at the tail because the allocator happily allocates the remote hugepage rather than even attempting to make a local hugepage available. zone_reclaim_mode is not a solution to this problem since it does not only impact hugepage allocations but rather changes the memory allocation strategy for *all* page allocations. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: David Rientjes <rientjes@google.com> 2019-09-04 12:54:18 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2019-09-28 14:05:38 -0700
commit: ac79f78dab892fcdc11fda8af5cc5e80d09dca8a (patch)
tree: e774a6c484bb2272dbbcb3d7c9253dd92fac5d39 /mm/mempolicy.c
parent: Linux 5.3 (diff)
download: linux-dev-ac79f78dab892fcdc11fda8af5cc5e80d09dca8a.tar.xz
linux-dev-ac79f78dab892fcdc11fda8af5cc5e80d09dca8a.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 65e0874fce17..9c9877a43d58 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1734,7 +1734,7 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
  * freeing by another task.  It is the caller's responsibility to free the
  * extra reference for shared policies.
  */
-struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
+static struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
 						unsigned long addr)
 {
 	struct mempolicy *pol = __get_vma_policy(vma, addr);
author	David Rientjes <rientjes@google.com>	2019-09-04 12:54:18 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2019-09-28 14:05:38 -0700
commit	ac79f78dab892fcdc11fda8af5cc5e80d09dca8a (patch)
tree	e774a6c484bb2272dbbcb3d7c9253dd92fac5d39 /mm/mempolicy.c
parent	Linux 5.3 (diff)
download	linux-dev-ac79f78dab892fcdc11fda8af5cc5e80d09dca8a.tar.xz linux-dev-ac79f78dab892fcdc11fda8af5cc5e80d09dca8a.zip