aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-07-10 16:58:42 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2017-07-10 16:58:42 -0700
commit9967468c0a109644e4a1f5b39b39bf86fe7507a7 (patch)
tree72b7764c6dd6d74ede25688545a2e0c29dde3a2b /Documentation
parentMerge tag 'devprop-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (diff)
parentkernel/exit.c: avoid undefined behaviour when calling wait4() (diff)
downloadlinux-dev-9967468c0a109644e4a1f5b39b39bf86fe7507a7.tar.xz
linux-dev-9967468c0a109644e4a1f5b39b39bf86fe7507a7.zip
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - most of the rest of MM - KASAN updates - lib/ updates - checkpatch updates - some binfmt_elf changes - various misc bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (115 commits) kernel/exit.c: avoid undefined behaviour when calling wait4() kernel/signal.c: avoid undefined behaviour in kill_something_info binfmt_elf: safely increment argv pointers s390: reduce ELF_ET_DYN_BASE powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB arm64: move ELF_ET_DYN_BASE to 4GB / 4MB arm: move ELF_ET_DYN_BASE to 4MB binfmt_elf: use ELF_ET_DYN_BASE only for PIE fs, epoll: short circuit fetching events if thread has been killed checkpatch: improve multi-line alignment test checkpatch: improve macro reuse test checkpatch: change format of --color argument to --color[=WHEN] checkpatch: silence perl 5.26.0 unescaped left brace warnings checkpatch: improve tests for multiple line function definitions checkpatch: remove false warning for commit reference checkpatch: fix stepping through statements with $stat and ctx_statement_block checkpatch: [HLP]LIST_HEAD is also declaration checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t checkpatch: improve the unnecessary OOM message test lib/bsearch.c: micro-optimize pivot position calculation ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/cgroup-v1/memory.txt47
-rw-r--r--Documentation/memory-hotplug.txt12
-rw-r--r--Documentation/sysctl/vm.txt20
3 files changed, 64 insertions, 15 deletions
diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index 946e69103cdd..cefb63639070 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -789,23 +789,46 @@ way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.
-The events are propagated upward until the event is handled, i.e. the
-events are not pass-through. Here is what this means: for example you have
-three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
-and C, and suppose group C experiences some pressure. In this situation,
-only group C will receive the notification, i.e. groups A and B will not
-receive it. This is done to avoid excessive "broadcasting" of messages,
-which disturbs the system and which is especially bad if we are low on
-memory or thrashing. So, organize the cgroups wisely, or propagate the
-events manually (or, ask us to implement the pass-through events,
-explaining why would you need them.)
+By default, events are propagated upward until the event is handled, i.e. the
+events are not pass-through. For example, you have three cgroups: A->B->C. Now
+you set up an event listener on cgroups A, B and C, and suppose group C
+experiences some pressure. In this situation, only group C will receive the
+notification, i.e. groups A and B will not receive it. This is done to avoid
+excessive "broadcasting" of messages, which disturbs the system and which is
+especially bad if we are low on memory or thrashing. Group B, will receive
+notification only if there are no event listers for group C.
+
+There are three optional modes that specify different propagation behavior:
+
+ - "default": this is the default behavior specified above. This mode is the
+ same as omitting the optional mode parameter, preserved by backwards
+ compatibility.
+
+ - "hierarchy": events always propagate up to the root, similar to the default
+ behavior, except that propagation continues regardless of whether there are
+ event listeners at each level, with the "hierarchy" mode. In the above
+ example, groups A, B, and C will receive notification of memory pressure.
+
+ - "local": events are pass-through, i.e. they only receive notifications when
+ memory pressure is experienced in the memcg for which the notification is
+ registered. In the above example, group C will receive notification if
+ registered for "local" notification and the group experiences memory
+ pressure. However, group B will never receive notification, regardless if
+ there is an event listener for group C or not, if group B is registered for
+ local notification.
+
+The level and event notification mode ("hierarchy" or "local", if necessary) are
+specified by a comma-delimited string, i.e. "low,hierarchy" specifies
+hierarchical, pass-through, notification for all ancestor memcgs. Notification
+that is the default, non pass-through behavior, does not specify a mode.
+"medium,local" specifies pass-through notification for the medium level.
The file memory.pressure_level is only used to setup an eventfd. To
register a notification, an application must:
- create an eventfd using eventfd(2);
- open memory.pressure_level;
-- write string like "<event_fd> <fd of memory.pressure_level> <level>"
+- write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
to cgroup.event_control.
Application will be notified through eventfd when memory pressure is at
@@ -821,7 +844,7 @@ Test:
# cd /sys/fs/cgroup/memory/
# mkdir foo
# cd foo
- # cgroup_event_listener memory.pressure_level low &
+ # cgroup_event_listener memory.pressure_level low,hierarchy &
# echo 8000000 > memory.limit_in_bytes
# echo 8000000 > memory.memsw.limit_in_bytes
# echo $$ > tasks
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 670f3ded0802..5c628e19d6cd 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -282,20 +282,26 @@ offlined it is possible to change the individual block's state by writing to the
% echo online > /sys/devices/system/memory/memoryXXX/state
This onlining will not change the ZONE type of the target memory block,
-If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+If the memory block doesn't belong to any zone an appropriate kernel zone
+(usually ZONE_NORMAL) will be used unless movable_node kernel command line
+option is specified when ZONE_MOVABLE will be used.
+
+You can explicitly request to associate it with ZONE_MOVABLE by
% echo online_movable > /sys/devices/system/memory/memoryXXX/state
(NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE)
-And if the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:
% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
(NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL)
+An explicit zone onlining can fail (e.g. when the range is already within
+and existing and incompatible zone already).
+
After this, memory block XXX's state will be 'online' and the amount of
available memory will be increased.
-Currently, newly added memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA).
This may be changed in future.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index b4ad97f10b8e..48244c42ff52 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -240,6 +240,26 @@ fragmentation index is <= extfrag_threshold. The default value is 500.
==============================================================
+highmem_is_dirtyable
+
+Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
+
+This parameter controls whether the high memory is considered for dirty
+writers throttling. This is not the case by default which means that
+only the amount of memory directly visible/usable by the kernel can
+be dirtied. As a result, on systems with a large amount of memory and
+lowmem basically depleted writers might be throttled too early and
+streaming writes can get very slow.
+
+Changing the value to non zero would allow more memory to be dirtied
+and thus allow writers to write more data which can be flushed to the
+storage more effectively. Note this also comes with a risk of pre-mature
+OOM killer because some writers (e.g. direct block device writes) can
+only use the low memory and they can fill it up with dirty data without
+any throttling.
+
+==============================================================
+
hugepages_treat_as_movable
This parameter controls whether we can allocate hugepages from ZONE_MOVABLE