aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/admin-guide/mm/ksm.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/mm/ksm.rst')
-rw-r--r--Documentation/admin-guide/mm/ksm.rst144
1 files changed, 139 insertions, 5 deletions
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 874eb0c77d34..a639cac12477 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -1,5 +1,3 @@
-.. _admin_guide_ksm:
-
=======================
Kernel Samepage Merging
=======================
@@ -9,7 +7,7 @@ Overview
KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation,
-and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
+and http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/
KSM was originally developed for use with KVM (where it was known as
Kernel Shared Memory), to fit more virtual machines into physical memory,
@@ -22,7 +20,7 @@ content which can be replaced by a single write-protected page (which
is automatically copied if a process later wants to update its
content). The amount of pages that KSM daemon scans in a single pass
and the time between the passes are configured using :ref:`sysfs
-intraface <ksm_sysfs>`
+interface <ksm_sysfs>`
KSM only merges anonymous (private) pages, never pagecache (file) pages.
KSM's merged pages were originally locked into kernel memory, but can now
@@ -52,7 +50,7 @@ with EAGAIN, but more probably arousing the Out-Of-Memory killer.
If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
and MADV_UNMERGEABLE simply fail with EINVAL. If the running kernel was
built with CONFIG_KSM=y, those calls will normally succeed: even if the
-the KSM daemon is not currently running, MADV_MERGEABLE still registers
+KSM daemon is not currently running, MADV_MERGEABLE still registers
the range for whenever the KSM daemon is started; even if the range
cannot contain any pages which KSM could actually merge; even if
MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
@@ -82,6 +80,9 @@ pages_to_scan
how many pages to scan before ksmd goes to sleep
e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
+ The pages_to_scan value cannot be changed if ``advisor_mode`` has
+ been set to scan-time.
+
Default: 100 (chosen for demonstration purposes)
sleep_millisecs
@@ -157,8 +158,44 @@ stable_node_chains_prune_millisecs
scan. It's a noop if not a single KSM page hit the
``max_page_sharing`` yet.
+smart_scan
+ Historically KSM checked every candidate page for each scan. It did
+ not take into account historic information. When smart scan is
+ enabled, pages that have previously not been de-duplicated get
+ skipped. How often these pages are skipped depends on how often
+ de-duplication has already been tried and failed. By default this
+ optimization is enabled. The ``pages_skipped`` metric shows how
+ effective the setting is.
+
+advisor_mode
+ The ``advisor_mode`` selects the current advisor. Two modes are
+ supported: none and scan-time. The default is none. By setting
+ ``advisor_mode`` to scan-time, the scan time advisor is enabled.
+ The section about ``advisor`` explains in detail how the scan time
+ advisor works.
+
+adivsor_max_cpu
+ specifies the upper limit of the cpu percent usage of the ksmd
+ background thread. The default is 70.
+
+advisor_target_scan_time
+ specifies the target scan time in seconds to scan all the candidate
+ pages. The default value is 200 seconds.
+
+advisor_min_pages_to_scan
+ specifies the lower limit of the ``pages_to_scan`` parameter of the
+ scan time advisor. The default is 500.
+
+adivsor_max_pages_to_scan
+ specifies the upper limit of the ``pages_to_scan`` parameter of the
+ scan time advisor. The default is 30000.
+
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
+general_profit
+ how effective is KSM. The calculation is explained below.
+pages_scanned
+ how many pages are being scanned for ksm
pages_shared
how many shared pages are being used
pages_sharing
@@ -167,12 +204,21 @@ pages_unshared
how many pages unique but repeatedly checked for merging
pages_volatile
how many pages changing too fast to be placed in a tree
+pages_skipped
+ how many pages did the "smart" page scanning algorithm skip
full_scans
how many times all mergeable areas have been scanned
stable_node_chains
the number of KSM pages that hit the ``max_page_sharing`` limit
stable_node_dups
number of duplicated KSM pages
+ksm_zero_pages
+ how many zero pages that are still mapped into processes were mapped by
+ KSM when deduplicating.
+
+When ``use_zero_pages`` is/was enabled, the sum of ``pages_sharing`` +
+``ksm_zero_pages`` represents the actual number of pages saved by KSM.
+if ``use_zero_pages`` has never been enabled, ``ksm_zero_pages`` is 0.
A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
@@ -184,6 +230,94 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
be increased accordingly.
+Monitoring KSM profit
+=====================
+
+KSM can save memory by merging identical pages, but also can consume
+additional memory, because it needs to generate a number of rmap_items to
+save each scanned page's brief rmap information. Some of these pages may
+be merged, but some may not be abled to be merged after being checked
+several times, which are unprofitable memory consumed.
+
+1) How to determine whether KSM save memory or consume memory in system-wide
+ range? Here is a simple approximate calculation for reference::
+
+ general_profit =~ ksm_saved_pages * sizeof(page) - (all_rmap_items) *
+ sizeof(rmap_item);
+
+ where ksm_saved_pages equals to the sum of ``pages_sharing`` +
+ ``ksm_zero_pages`` of the system, and all_rmap_items can be easily
+ obtained by summing ``pages_sharing``, ``pages_shared``, ``pages_unshared``
+ and ``pages_volatile``.
+
+2) The KSM profit inner a single process can be similarly obtained by the
+ following approximate calculation::
+
+ process_profit =~ ksm_saved_pages * sizeof(page) -
+ ksm_rmap_items * sizeof(rmap_item).
+
+ where ksm_saved_pages equals to the sum of ``ksm_merging_pages`` and
+ ``ksm_zero_pages``, both of which are shown under the directory
+ ``/proc/<pid>/ksm_stat``, and ksm_rmap_items is also shown in
+ ``/proc/<pid>/ksm_stat``. The process profit is also shown in
+ ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
+
+From the perspective of application, a high ratio of ``ksm_rmap_items`` to
+``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
+administrators have to rethink how to change madvise policy. Giving an example
+for reference, a page's size is usually 4K, and the rmap_item's size is
+separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
+so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
+or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
+because the ksm profit is approximately zero or negative.
+
+Monitoring KSM events
+=====================
+
+There are some counters in /proc/vmstat that may be used to monitor KSM events.
+KSM might help save memory, it's a tradeoff by may suffering delay on KSM COW
+or on swapping in copy. Those events could help users evaluate whether or how
+to use KSM. For example, if cow_ksm increases too fast, user may decrease the
+range of madvise(, , MADV_MERGEABLE).
+
+cow_ksm
+ is incremented every time a KSM page triggers copy on write (COW)
+ when users try to write to a KSM page, we have to make a copy.
+
+ksm_swpin_copy
+ is incremented every time a KSM page is copied when swapping in
+ note that KSM page might be copied when swapping in because do_swap_page()
+ cannot do all the locking needed to reconstitute a cross-anon_vma KSM page.
+
+Advisor
+=======
+
+The number of candidate pages for KSM is dynamic. It can be often observed
+that during the startup of an application more candidate pages need to be
+processed. Without an advisor the ``pages_to_scan`` parameter needs to be
+sized for the maximum number of candidate pages. The scan time advisor can
+changes the ``pages_to_scan`` parameter based on demand.
+
+The advisor can be enabled, so KSM can automatically adapt to changes in the
+number of candidate pages to scan. Two advisors are implemented: none and
+scan-time. With none, no advisor is enabled. The default is none.
+
+The scan time advisor changes the ``pages_to_scan`` parameter based on the
+observed scan times. The possible values for the ``pages_to_scan`` parameter is
+limited by the ``advisor_max_cpu`` parameter. In addition there is also the
+``advisor_target_scan_time`` parameter. This parameter sets the target time to
+scan all the KSM candidate pages. The parameter ``advisor_target_scan_time``
+decides how aggressive the scan time advisor scans candidate pages. Lower
+values make the scan time advisor to scan more aggresively. This is the most
+important parameter for the configuration of the scan time advisor.
+
+The initial value and the maximum value can be changed with
+``advisor_min_pages_to_scan`` and ``advisor_max_pages_to_scan``. The default
+values are sufficient for most workloads and use cases.
+
+The ``pages_to_scan`` parameter is re-calculated after a scan has been completed.
+
+
--
Izik Eidus,
Hugh Dickins, 17 Nov 2009