aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst19
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt40
-rw-r--r--Documentation/admin-guide/mm/hugetlbpage.rst35
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst7
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst23
5 files changed, 94 insertions, 30 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0ae4f564c2d6..12757e63b26c 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
At page migration, accounting information is kept.
@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
-
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+2.4 Swap Extension
--------------------------------------
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
+Swap usage is always recorded for each of cgroup. Swap Extension allows you to
+read and limit it.
-When swap is accounted, following files are added.
+When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a76d83ed5262..0b4b9e1e35b6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -834,12 +834,15 @@
See also Documentation/networking/decnet.rst.
default_hugepagesz=
- [same as hugepagesz=] The size of the default
- HugeTLB page size. This is the size represented by
- the legacy /proc/ hugepages APIs, used for SHM, and
- default size when mounting hugetlbfs filesystems.
- Defaults to the default architecture's huge page size
- if not specified.
+ [HW] The size of the default HugeTLB page. This is
+ the size represented by the legacy /proc/ hugepages
+ APIs. In addition, this is the default hugetlb size
+ used for shmget(), mmap() and mounting hugetlbfs
+ filesystems. If not specified, defaults to the
+ architecture's default huge page size. Huge page
+ sizes are architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -1484,13 +1487,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
- hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
- hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
- On x86-64 and powerpc, this option can be specified
- multiple times interleaved with hugepages= to reserve
- huge pages of different sizes. Valid pages sizes on
- x86-64 are 2M (when the CPU supports "pse") and 1G
- (when the CPU supports the "pdpe1gb" cpuinfo flag).
+ hugepages= [HW] Number of HugeTLB pages to allocate at boot.
+ If this follows hugepagesz (below), it specifies
+ the number of pages of hugepagesz to be allocated.
+ If this is the first HugeTLB parameter on the command
+ line, it specifies the number of pages to allocate for
+ the default huge page size. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: <integer>
+
+ hugepagesz=
+ [HW] The size of the HugeTLB pages. This is used in
+ conjunction with hugepages (above) to allocate huge
+ pages of a specific size at boot. The pair
+ hugepagesz=X hugepages=Y can be specified once for
+ each supported huge page size. Huge page sizes are
+ architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
hung_task_panic=
[KNL] Should the hung task detector generate panics.
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 1cc0bc78d10e..5026e58826e2 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
+Hugetlb boot command line parameter semantics
+hugepagesz - Specify a huge page size. Used in conjunction with hugepages
+ parameter to preallocate a number of huge pages of the specified
+ size. Hence, hugepagesz and hugepages are typically specified in
+ pairs such as:
+ hugepagesz=2M hugepages=512
+ hugepagesz can only be specified once on the command line for a
+ specific huge page size. Valid huge page sizes are architecture
+ dependent.
+hugepages - Specify the number of huge pages to preallocate. This typically
+ follows a valid hugepagesz or default_hugepagesz parameter. However,
+ if hugepages is the first or only hugetlb command line parameter it
+ implicitly specifies the number of huge pages of default size to
+ allocate. If the number of huge pages of default size is implicitly
+ specified, it can not be overwritten by a hugepagesz,hugepages
+ parameter pair for the default size.
+ For example, on an architecture with 2M default huge page size:
+ hugepages=256 hugepagesz=2M hugepages=512
+ will result in 256 2M huge pages being allocated and a warning message
+ indicating that the hugepages=512 parameter is ignored. If a hugepages
+ parameter is preceded by an invalid hugepagesz parameter, it will
+ be ignored.
+default_hugepagesz - Specify the default huge page size. This parameter can
+ only be specified once on the command line. default_hugepagesz can
+ optionally be followed by the hugepages parameter to preallocate a
+ specific number of huge pages of default size. The number of default
+ sized huge pages to preallocate can also be implicitly specified as
+ mentioned in the hugepages section above. Therefore, on an
+ architecture with 2M default huge page size:
+ hugepages=256
+ default_hugepagesz=2M hugepages=256
+ hugepages=256 default_hugepagesz=2M
+ will all result in 256 2M huge pages being allocated. Valid default
+ huge page size is architecture dependent.
+
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Thus, one can use the following command to dynamically allocate/deallocate
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 2f31de8f7c74..6a233e42be08 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.
+``max_ptes_shared`` specifies how many pages can be shared across multiple
+processes. Exceeding the number would block the collapse::
+
+ /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
+
+A higher value may increase memory footprint for some workloads.
+
Boot parameter
==============
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 0329a4d3fa9e..d46d5b7013c6 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -831,14 +831,27 @@ tooling to work, you can do::
swappiness
==========
-This control is used to define how aggressive the kernel will swap
-memory pages. Higher values will increase aggressiveness, lower values
-decrease the amount of swap. A value of 0 instructs the kernel not to
-initiate swap until the amount of free and file-backed pages is less
-than the high water mark in a zone.
+This control is used to define the rough relative IO cost of swapping
+and filesystem paging, as a value between 0 and 200. At 100, the VM
+assumes equal IO cost and will thus apply memory pressure to the page
+cache and swap-backed pages equally; lower values signify more
+expensive swap IO, higher values indicates cheaper.
+
+Keep in mind that filesystem IO patterns under memory pressure tend to
+be more efficient than swap's random IO. An optimal value will require
+experimentation and will also be workload-dependent.
The default value is 60.
+For in-memory swap, like zram or zswap, as well as hybrid setups that
+have swap on faster devices than the filesystem, values beyond 100 can
+be considered. For example, if the random IO against the swap device
+is on average 2x faster than IO from the filesystem, swappiness should
+be 133 (x + 2x = 200, 2x = 133.33).
+
+At 0, the kernel will not initiate swap until the amount of free and
+file-backed pages is less than the high watermark in a zone.
+
unprivileged_userfaultfd
========================