aboutsummaryrefslogtreecommitdiffstats
path: root/drivers (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-11-17lib/int_sqrt: optimize initial value computePeter Zijlstra1-4/+2
The initial value (@m) compute is: m = 1UL << (BITS_PER_LONG - 2); while (m > x) m >>= 2; Which is a linear search for the highest even bit smaller or equal to @x We can implement this using a binary search using __fls() (or better when its hardware implemented). m = 1UL << (__fls(x) & ~1UL); Especially for small values of @x; which are the more common arguments when doing a CDF on idle times; the linear search is near to worst case, while the binary search of __fls() is a constant 6 (or 5 on 32bit) branches. cycles: branches: branch-misses: PRE: hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681 cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219 SOFTWARE FLS: hot: 29.576176 +- 0.028850 26.666730 +- 0.004511 0.019463 +- 0.000663 cold: 165.947136 +- 0.188406 26.666746 +- 0.004511 6.133897 +- 0.004386 HARDWARE FLS: hot: 24.720922 +- 0.025161 20.666784 +- 0.004509 0.020836 +- 0.000677 cold: 132.777197 +- 0.127471 20.666776 +- 0.004509 5.080285 +- 0.003874 Averages computed over all values <128k using a LFSR to generate order. Cold numbers have a LFSR based branch trace buffer 'confuser' ran between each int_sqrt() invocation. Link: http://lkml.kernel.org/r/20171020164644.936577234@infradead.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Suggested-by: Joe Perches <joe@perches.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Anshul Garg <aksgarg1989@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Michael Davidson <md@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lib/int_sqrt: optimize small argumentPeter Zijlstra1-0/+3
The current int_sqrt() computation is sub-optimal for the case of small @x. Which is the interesting case when we're going to do cumulative distribution functions on idle times, which we assume to be a random variable, where the target residency of the deepest idle state gives an upper bound on the variable (5e6ns on recent Intel chips). In the case of small @x, the compute loop: while (m != 0) { b = y + m; y >>= 1; if (x >= b) { x -= b; y += m; } m >>= 2; } can be reduced to: while (m > x) m >>= 2; Because y==0, b==m and until x>=m y will remain 0. And while this is computationally equivalent, it runs much faster because there's less code, in particular less branches. cycles: branches: branch-misses: OLD: hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593 cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305 PRE: hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113 cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568 POST: hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681 cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219 Averages computed over all values <128k using a LFSR to generate order. Cold numbers have a LFSR based branch trace buffer 'confuser' ran between each int_sqrt() invocation. Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Suggested-by: Anshul Garg <aksgarg1989@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lib/test: delete five error messages for failed memory allocationsMarkus Elfring3-15/+7
Omit extra messages for a memory allocation failure in these functions. This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/410a4c5a-4ee0-6fcc-969c-103d8e496b78@users.sourceforge.net Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lib: add module support to string testsGeert Uytterhoeven4-142/+143
Extract the string test code into its own source file, to allow compiling it either to a loadable module, or built into the kernel. Fixes: 03270c13c5ffaa6a ("lib/string.c: add testcases for memset16/32/64") Link: http://lkml.kernel.org/r/1505397744-3387-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>Masahiro Yamada1-1/+0
This include was added by commit 187f1882b5b0 ("BUG: headers with BUG/BUG_ON etc. need linux/bug.h") because BUG_ON() was used in this header at that time. Some time later, commit 6d75f366b924 ("lib: radix-tree: check accounting of existing slot replacement users") removed the use of BUG_ON() from this header. Since then, there is no reason to include <linux/bug.h>. Link: http://lkml.kernel.org/r/1505660151-4383-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Mi <chrism@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/bitfield.h: include <linux/build_bug.h> instead of <linux/bug.h>Masahiro Yamada1-1/+1
Since commit bc6245e5efd7 ("bug: split BUILD_BUG stuff out into <linux/build_bug.h>"), #include <linux/build_bug.h> is better to pull minimal headers needed for BUILG_BUG() family. Link: http://lkml.kernel.org/r/1505700775-19826-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Dinan Gunawardena <dinan.gunawardena@netronome.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17get_maintainer: add more --self-test optionsJoe Perches1-17/+132
Add tests for duplicate section headers, missing section content, link and scm reachability. Miscellanea: o Add --self-test=<foo> options (a comma separated list of any of sections, patterns, links or scm) where the default without options is all tests o Rename check_maintainers_patterns to self_test o Rename self_test_pattern_info to self_test_info [tom.saeger@oracle.com: improvements] Link: http://lkml.kernel.org/r/13e3986c374902fcf08ae947e36c5c608bbe3b79.1510075301.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17get_maintainer: add --self-test for internal consistency testsTom Saeger1-17/+77
Add "--self-test" option to get_maintainer.pl to show potential issues in MAINTAINERS file(s) content. Pattern check warnings are shown for "F" and "X" patterns found in MAINTAINERS file(s) which do not match any files known by git. Link: http://lkml.kernel.org/r/64994f911b3510d0f4c8ac2e113501dfcec1f3c9.1509559540.git.tom.saeger@oracle.com Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17dynamic_debug documentation: minor fixesRandy Dunlap1-3/+3
Fix minor typo. Fix missing words in explaining parsing of last line number. Link: http://lkml.kernel.org/r/ebb7ff42-4945-103f-d5b4-f07a6f3343a7@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0Randy Dunlap1-0/+4
line-range is supposed to treat "1-" as "1-endoffile", so handle the special case by setting last_lineno to UINT_MAX. Fixes this error: dynamic_debug:ddebug_parse_query: last-line:0 < 1st-line:1 dynamic_debug:ddebug_exec_query: query parse failed Link: http://lkml.kernel.org/r/10a6a101-e2be-209f-1f41-54637824788e@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel/umh.c: optimize 'proc_cap_handler()'Christophe JAILLET1-2/+2
If 'write' is 0, we can avoid a call to spin_lock/spin_unlock. Link: http://lkml.kernel.org/r/20171020193331.7233-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/compiler-clang.h: handle randomizable anonymous structsSandipan Das1-0/+3
The GCC randomize layout plugin can randomize the member offsets of sensitive kernel data structures. To use this feature, certain annotations and members are added to the structures which affect the member offsets even if this plugin is not used. All of these structures are completely randomized, except for task_struct which leaves out some of its members. All the other members are wrapped within an anonymous struct with the __randomize_layout attribute. This is done using the randomized_struct_fields_start and randomized_struct_fields_end defines. When the plugin is disabled, the behaviour of this attribute can vary based on the GCC version. For GCC 5.1+, this attribute maps to __designated_init otherwise it is just an empty define but the anonymous structure is still present. For other compilers, both randomized_struct_fields_start and randomized_struct_fields_end default to empty defines meaning the anonymous structure is not introduced at all. So, if a module compiled with Clang, such as a BPF program, needs to access task_struct fields such as pid and comm, the offsets of these members as recognized by Clang are different from those recognized by modules compiled with GCC. If GCC 4.6+ is used to build the kernel, this can be solved by introducing appropriate defines for Clang so that the anonymous structure is seen when determining the offsets for the members. Link: http://lkml.kernel.org/r/20171109064645.25581-1-sandipan@linux.vnet.ibm.com Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: fix "cut here" location for __WARN_TAINT architecturesKees Cook2-3/+18
Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s. After WARN() was moved to using UD0 on x86, the warning text started appearing _before_ the "cut here" line. This appears to have been a long-standing bug on architectures that used __WARN_TAINT, but it didn't get fixed. v4.11 and earlier on x86: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30 This is a warning message Modules linked in: v4.12 and later on x86: This is a warning message ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20 Modules linked in: With this fix: ------------[ cut here ]------------ This is a warning message WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20 Since the __FILE__ reporting happens as part of the UD0 handler, it isn't trivial to move the message to after the WARNING line, but at least we can fix the position of the "cut here" line so all the various logging tools will start including the actual runtime warning message again, when they follow the instruction and "cut here". Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: define the "cut here" string in a single placeKees Cook4-3/+5
The "cut here" string is used in a few paths. Define it in a single place. Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lkdtm: include WARN format stringKees Cook1-1/+3
In order to test the ordering of WARN format strings, actually include one in LKDTM. Link: http://lkml.kernel.org/r/1510100869-73751-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17iopoll: avoid -Wint-in-bool-context warningArnd Bergmann1-9/+15
When we pass the result of a multiplication as the timeout or the delay, we can get a warning from gcc-7: drivers/mmc/host/bcm2835.c:596:149: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/mfd/arizona-core.c:247:195: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c:49:27: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] The warning is a bit questionable inside of a macro, but this is intentional on the side of the gcc developers. It is also an indication of another problem: we evaluate the timeout and sleep arguments multiple times, which can have undesired side-effects when those are complex expressions. This changes the two iopoll variants to use local variables for storing copies of the timeouts. This adds some more type safety, and avoids both the double-evaluation and the gcc warning. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484 Link: http://lkml.kernel.org/r/20170726133756.2161367-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20171102114048.1526955-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17parse-maintainers: add ability to specify filenamesJoe Perches1-5/+47
parse-maintainers.pl is convenient, but currently hard-codes the filenames that are used. Allow user-specified filenames to simplify the use of the script. Link: http://lkml.kernel.org/r/48703c068b3235223ffa3b2eb268fa0a125b25e0.1502251549.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN_ONCE for all architecturesAndi Kleen3-1/+30
Some architectures store the WARN_ONCE state in the flags field of the bug_entry. Clear that one too when resetting once state through /sys/kernel/debug/clear_warn_once Pointed out by Michael Ellerman Improves the earlier patch that add clear_warn_once. [ak@linux.intel.com: add a missing ifdef CONFIG_MODULES] Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org [akpm@linux-foundation.org: fix unused var warning] [akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe] [akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe] Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN*_ONCEAndi Kleen5-3/+42
I like _ONCE warnings because it's guaranteed that they don't flood the log. During testing I find it useful to reset the state of the once warnings, so that I can rerun tests and see if they trigger again, or can guarantee that a test run always hits the same warnings. This patch adds a debugfs interface to reset all the _ONCE warnings so that they appear again: echo 1 > /sys/kernel/debug/clear_warn_once This is implemented by putting all the warning booleans into a special section, and clearing it. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17sh/boot: add static stack-protector to pre-kernelKees Cook1-0/+14
The sh decompressor code triggers stack-protector code generation when using CONFIG_CC_STACKPROTECTOR_STRONG. As done for arm and mips, add a simple static stack-protector canary. As this wasn't protected before, the risk of using a weak canary is minimized. Once the kernel is actually up, a better canary is chosen. Link: http://lkml.kernel.org/r/1506972007-80614-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <mmarek@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17spelling.txt: add "unnecessary" typo variantsJoe Perches1-0/+4
Add unnecessary typos by copying the necessary typos. Link: http://lkml.kernel.org/r/1505074722.22023.6.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17proc: use do-while in name_to_int()Alexey Dobriyan1-2/+2
Gcc doesn't know that "len" is guaranteed to be >=1 by dcache and generates standard while-loop prologue duplicating loop condition. add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27) function old new delta name_to_int 104 77 -27 Link: http://lkml.kernel.org/r/20170912195213.GB17730@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17proc: : uninline name_to_int()Alexey Dobriyan3-22/+25
Save ~360 bytes. add/remove: 1/0 grow/shrink: 0/4 up/down: 104/-463 (-359) function old new delta name_to_int - 104 +104 proc_pid_lookup 217 126 -91 proc_lookupfd_common 212 121 -91 proc_task_lookup 289 194 -95 __proc_create 588 402 -186 Link: http://lkml.kernel.org/r/20170912194850.GA17730@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17proc, coredump: add CoreDumping flag to /proc/pid/statusRoman Gushchin2-0/+9
Right now there is no convenient way to check if a process is being coredumped at the moment. It might be necessary to recognize such state to prevent killing the process and getting a broken coredump. Writing a large core might take significant time, and the process is unresponsive during it, so it might be killed by timeout, if another process is monitoring and killing/restarting hanging tasks. We're getting a significant number of corrupted coredump files on machines in our fleet, just because processes are being killed by timeout in the middle of the core writing process. We do have a process health check, and some agent is responsible for restarting processes which are not responding for health check requests. Writing a large coredump to the disk can easily exceed the reasonable timeout (especially on an overloaded machine). This flag will allow the agent to distinguish processes which are being coredumped, extend the timeout for them, and let them produce a full coredump file. To provide an ability to detect if a process is in the state of being coredumped, we can expose a boolean CoreDumping flag in /proc/pid/status. Example: $ cat core.sh #!/bin/sh echo "|/usr/bin/sleep 10" > /proc/sys/kernel/core_pattern sleep 1000 & PID=$! cat /proc/$PID/status | grep CoreDumping kill -ABRT $PID sleep 1 cat /proc/$PID/status | grep CoreDumping $ ./core.sh CoreDumping: 0 CoreDumping: 1 [guro@fb.com: document CoreDumping flag in /proc/<pid>/status] Link: http://lkml.kernel.org/r/20170928135357.GA8470@castle.DHCP.thefacebook.com Link: http://lkml.kernel.org/r/20170920230634.31572-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: remove unneeded pageblock_skip_persistent() checksVlastimil Babka1-15/+3
Commit f3c931633a59 ("mm, compaction: persistently skip hugetlbfs pageblocks") has introduced pageblock_skip_persistent() checks into migration and free scanners, to make sure pageblocks that should be persistently skipped are marked as such, regardless of the ignore_skip_hint flag. Since the previous patch introduced a new no_set_skip_hint flag, the ignore flag no longer prevents marking pageblocks as skipped. Therefore we can remove the special cases. The relevant pageblocks will be marked as skipped by the common logic which marks each pageblock where no page could be isolated. This makes the code simpler. Link: http://lkml.kernel.org/r/20171102121706.21504-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: split off flag for not updating skip hintsVlastimil Babka3-1/+3
Pageblock skip hints were added as a heuristic for compaction, which shares core code with CMA. Since CMA reliability would suffer from the heuristics, compact_control flag ignore_skip_hint was added for the CMA use case. Since 6815bf3f233e ("mm/compaction: respect ignore_skip_hint in update_pageblock_skip") the flag also means that CMA won't *update* the skip hints in addition to ignoring them. Today, direct compaction can also ignore the skip hints in the last resort attempt, but there's no reason not to set them when isolation fails in such case. Thus, this patch splits off a new no_set_skip_hint flag to avoid the updating, which only CMA sets. This should improve the heuristics a bit, and allow us to simplify the persistent skip bit handling as the next step. Link: http://lkml.kernel.org/r/20171102121706.21504-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: extend pageblock_skip_persistent() to all compound pagesVlastimil Babka1-11/+14
pageblock_skip_persistent() checks for HugeTLB pages of pageblock order. When clearing pageblock skip bits for compaction, the bits are not cleared for such pageblocks, because they cannot contain base pages suitable for migration, nor free pages to use as migration targets. This optimization can be simply extended to all compound pages of order equal or larger than pageblock order, because migrating such pages (if they support it) cannot help sub-pageblock fragmentation. This includes THP's and also gigantic HugeTLB pages, which the current implementation doesn't persistently skip due to a strict pageblock_order equality check and not recognizing tail pages. While THP pages are generally less "persistent" than HugeTLB, we can still expect that if a THP exists at the point of __reset_isolation_suitable(), it will exist also during the subsequent compaction run. The time difference here could be actually smaller than between a compaction run that sets a (non-persistent) skip bit on a THP, and the next compaction run that observes it. Link: http://lkml.kernel.org/r/20171102121706.21504-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: persistently skip hugetlbfs pageblocksDavid Rientjes2-12/+55
It is pointless to migrate hugetlb memory as part of memory compaction if the hugetlb size is equal to the pageblock order. No defragmentation is occurring in this condition. It is also pointless to for the freeing scanner to scan a pageblock where a hugetlb page is pinned. Unconditionally skip these pageblocks, and do so peristently so that they are not rescanned until it is observed that these hugepages are no longer pinned. It would also be possible to do this by involving the hugetlb subsystem in marking pageblocks to no longer be skipped when they hugetlb pages are freed. This is a simple solution that doesn't involve any additional subsystems in pageblock skip manipulation. [rientjes@google.com: fix build] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708201734390.117182@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151639130.106658@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: kcompactd should not ignore pageblock skipDavid Rientjes1-2/+1
Kcompactd is needlessly ignoring pageblock skip information. It is doing MIGRATE_SYNC_LIGHT compaction, which is no more powerful than MIGRATE_SYNC compaction. If compaction recently failed to isolate memory from a set of pageblocks, there is nothing to indicate that kcompactd will be able to do so, or that it is beneficial from attempting to isolate memory. Use the pageblock skip hint to avoid rescanning pageblocks needlessly until that information is reset. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151638550.106658@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm: shmem: remove unused info variableCorentin Labbe1-2/+0
Fix the following warning by removing the unused variable: mm/shmem.c:3205:27: warning: variable 'info' set but not used [-Wunused-but-set-variable] Link: http://lkml.kernel.org/r/1510774029-30652-1-git-send-email-clabbe@baylibre.com Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lib/dma-debug.c: fix incorrect pfn calculationMiles Chen1-2/+18
dma-debug reports the following warning: WARNING: CPU: 3 PID: 298 at kernel-4.4/lib/dma-debug.c:604 debug _dma_assert_idle+0x1a8/0x230() DMA-API: cpu touching an active dma mapped cacheline [cln=0x00000882300] CPU: 3 PID: 298 Comm: vold Tainted: G W O 4.4.22+ #1 Hardware name: MT6739 (DT) Call trace: debug_dma_assert_idle+0x1a8/0x230 wp_page_copy.isra.96+0x118/0x520 do_wp_page+0x4fc/0x534 handle_mm_fault+0xd4c/0x1310 do_page_fault+0x1c8/0x394 do_mem_abort+0x50/0xec I found that debug_dma_alloc_coherent() and debug_dma_free_coherent() assume that dma_alloc_coherent() always returns a linear address. However it's possible that dma_alloc_coherent() returns a non-linear address. In this case, page_to_pfn(virt_to_page(virt)) will return an incorrect pfn. If the pfn is valid and mapped as a COW page, we will hit the warning when doing wp_page_copy(). Fix this by calculating pfn for linear and non-linear addresses. [miles.chen@mediatek.com: v4] Link: http://lkml.kernel.org/r/1510872972-23919-1-git-send-email-miles.chen@mediatek.com Link: http://lkml.kernel.org/r/1506484087-1177-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm/z3fold.c: use kref to prevent page free/compact raceVitaly Wool1-2/+8
There is a race in the current z3fold implementation between do_compact() called in a work queue context and the page release procedure when page's kref goes to 0. do_compact() may be waiting for page lock, which is released by release_z3fold_page_locked right before putting the page onto the "stale" list, and then the page may be freed as do_compact() modifies its contents. The mechanism currently implemented to handle that (checking the PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref counter to guarantee that the page is not released if its compaction is scheduled. It then becomes compaction function's responsibility to decrease the counter and quit immediately if the page was actually freed. Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com Signed-off-by: Vitaly Wool <vitaly.wool@sonymobile.com> Cc: <Oleksiy.Avramchenko@sony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm: fix nodemask printingArnd Bergmann1-3/+10
The cleanup caused build warnings for constant mask pointers: mm/mempolicy.c: In function `mpol_to_str': ./include/linux/nodemask.h:108:11: warning: the comparison will always evaluate as `true' for the address of `nodes' will never be NULL [-Waddress] An earlier workaround I suggested was incorporated in the version that got merged, but that only solved the problem for gcc-7 and higher, while gcc-4.6 through gcc-6.x still warn. This changes the printing again to use inline functions that make it clear to the compiler that the line that does the NULL check has no idea whether the argument is a constant NULL. Link: http://lkml.kernel.org/r/20171117101545.119689-1-arnd@arndb.de Fixes: 0205f75571e3 ("mm: simplify nodemask printing") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Zhangshaokun <zhangshaokun@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16dm bufio: fix integer overflow when limiting maximum cache sizeEric Biggers1-9/+6
The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Cc: <stable@vger.kernel.org> # v3.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16dm: clear all discard attributes in queue_limits when discards are disabledMike Snitzer1-2/+8
Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the various discard attributes (which get exposed via sysfs) may be set. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16dm: do not set 'discards_supported' in targets that do not need itMike Snitzer2-7/+0
The DM target's 'discards_supported' flag is intended to act as an override. Meaning, even if the underlying storage doesn't support discards the DM target will. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16dm: discard support requires all targets in a table support discardsMike Snitzer1-19/+14
A DM device with a mix of discard capabilities (due to some underlying devices not having discard support) _should_ just return -EOPNOTSUPP for the region of the device that doesn't support discards (even if only by way of the underlying driver formally not supporting discards). BUT, that does ask the underlying driver to handle something that it never advertised support for. In doing so we're exposing users to the potential for a underlying disk driver hanging if/when a discard is issued a the device that is incapable and never claimed to support discards. Fix this by requiring that each DM target in a DM table provide discard support as a prereq for a DM device to advertise support for discards. This may cause some configurations that were happily supporting discards (even in the face of a mix of discard support) to stop supporting discards -- but the risk of users hitting driver hangs, and forced reboots, outweighs supporting those fringe mixed discard configurations. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16dm mpath: remove annoying message of 'blk_get_request() returned -11'Ming Lei1-2/+0
It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit 7083abbbf. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16RDMA: Add Jason Gunthorpe as a co-maintainerJason Gunthorpe2-0/+3
As was discussed in September and October, add Jason along with Doug to have a team maintainership model for the RDMA subystem. Mellanox Technologies will be funding Jason's independent work on the maintainership. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-16arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3Masahiro Yamada1-1/+2
Commit 429f203eb712 ("arm64: dts: uniphier: route on-board device IRQ to GPIO controller") missed to update this DTS. It becames a real problem when arm and arm64 trees are merged together. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-11-15memory hotplug: fix comments when adding sectionFan Du1-1/+1
Here, pfn_to_node should be page_to_nid. Link: http://lkml.kernel.org/r/1510735205-22540-1-git-send-email-fan.du@intel.com Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAPOscar Salvador1-7/+7
free_area_init_node() calls alloc_node_mem_map(), but this function does nothing unless we have CONFIG_FLAT_NODE_MEM_MAP. As a cleanup, we can move the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" within alloc_node_mem_map() out of the function, and define a alloc_node_mem_map() { } when CONFIG_FLAT_NODE_MEM_MAP is not present. This also moves the printk that lays within the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" block from free_area_init_node() to alloc_node_mem_map(), getting rid of the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" in free_area_init_node(). [akpm@linux-foundation.org: clean up the printk while we're there] Link: http://lkml.kernel.org/r/20171114111935.GA11758@techadventures.net Signed-off-by: Oscar Salvador <osalvador@techadventures.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: simplify nodemask printingMichal Hocko3-18/+10
alloc_warn() and dump_header() have to explicitly handle NULL nodemask which forces both paths to use pr_cont. We can do better. printk already handles NULL pointers properly so all we need is to teach nodemask_pr_args to handle NULL nodemask carefully. This allows simplification of both alloc_warn() and dump_header() and gets rid of pr_cont altogether. This patch has been motivated by patch from Joe Perches http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com [akpm@linux-foundation.org: fix tile warning, per Arnd] Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm,oom_reaper: remove pointless kthread_run() error checkTetsuo Handa1-8/+0
Since oom_init() is called before userspace processes start, memory allocation failure for creating the OOM reaper kernel thread will let the OOM killer call panic() rather than wake up the OOM reaper. Link: http://lkml.kernel.org/r/1510137800-4602-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm/page_ext.c: check if page_ext is not preparedJaewon Kim1-4/+0
online_page_ext() and page_ext_init() allocate page_ext for each section, but they do not allocate if the first PFN is !pfn_present(pfn) or !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN, __set_page_owner will try to get page_ext through lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value 0. This incurrs invalid address access. This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00 is being used for page_ext. section->page_ext is NULL, get_entry returned invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00. To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext will be checked at all times. Unable to handle kernel paging request at virtual address 01dfa014 ------------[ cut here ]------------ Kernel BUG at ffffff80082371e0 [verbose debug info unavailable] Internal error: Oops: 96000045 [#1] PREEMPT SMP Modules linked in: PC is at __set_page_owner+0x48/0x78 LR is at __set_page_owner+0x44/0x78 __set_page_owner+0x48/0x78 get_page_from_freelist+0x880/0x8e8 __alloc_pages_nodemask+0x14c/0xc48 __do_page_cache_readahead+0xdc/0x264 filemap_fault+0x2ac/0x550 ext4_filemap_fault+0x3c/0x58 __do_fault+0x80/0x120 handle_mm_fault+0x704/0xbb0 do_page_fault+0x2e8/0x394 do_mem_abort+0x88/0x124 Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites"). Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging") Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: <stable@vger.kernel.org> [depends on f86e427197, see above] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15writeback: remove unused function parameterWang Long2-3/+3
The parameter `struct bdi_writeback *wb` is not been used in the function body. Remove it. Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.com Signed-off-by: Wang Long <wanglong19@meituan.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: do not rely on preempt_count in print_vma_addrMichal Hocko1-5/+3
The preempt count check on print_vma_addr has been added by commit e8bff74afbdb ("x86: fix "BUG: sleeping function called from invalid context" in print_vma_addr()") and it relied on the elevated preempt count from preempt_conditional_sti because preempt_count check doesn't work on non preemptive kernels by default. The code has evolved though and commit d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") has replaced preempt_conditional_sti by an explicit preempt_disable which is noop on !PREEMPT so the check in print_vma_addr is broken. Fix the issue by using trylock on mmap_sem rather than chacking the preempt count. The allocation we are relying on has to be GFP_NOWAIT as well. There is a chance that we won't dump the vma state if the lock is contended or the memory short but this is acceptable outcome and much less fragile than the not working preemption check or tricks around it. Link: http://lkml.kernel.org/r/20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz Fixes: d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, sparse: do not swamp log with huge vmemmap allocation failuresMichal Hocko2-3/+10
While doing memory hotplug tests under heavy memory pressure we have noticed too many page allocation failures when allocating vmemmap memmap backed by huge page kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO) [...] Call Trace: dump_trace+0x59/0x310 show_stack_log_lvl+0xea/0x170 show_stack+0x21/0x40 dump_stack+0x5c/0x7c warn_alloc_failed+0xe2/0x150 __alloc_pages_nodemask+0x3ed/0xb20 alloc_pages_current+0x7f/0x100 vmemmap_alloc_block+0x79/0xb6 __vmemmap_alloc_block_buf+0x136/0x145 vmemmap_populate+0xd2/0x2b9 sparse_mem_map_populate+0x23/0x30 sparse_add_one_section+0x68/0x18e __add_pages+0x10a/0x1d0 arch_add_memory+0x4a/0xc0 add_memory_resource+0x89/0x160 add_memory+0x6d/0xd0 acpi_memory_device_add+0x181/0x251 acpi_bus_attach+0xfd/0x19b acpi_bus_scan+0x59/0x69 acpi_device_hotplug+0xd2/0x41f acpi_hotplug_work_fn+0x1a/0x23 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xbd/0xe0 ret_from_fork+0x3f/0x70 and we do see many of those because essentially every allocation fails for each memory section. This is an excessive way to tell the user that there is nothing to really worry about because we do have a fallback mechanism to use base pages. The only downside might be a performance degradation due to TLB pressure. This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn explicitly once on the first allocation failure. This will reduce the noise in the kernel log considerably, while we still have an indication that a performance might be impacted. [mhocko@kernel.org: forgot to git add the follow up fix] Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm/hmm: remove redundant variable align_endColin Ian King1-2/+1
Variable align_end is assigned a value but it is never read, so the variable is redundant and can be removed. Cleans up the clang warning: Value stored to 'align_end' is never read Link: http://lkml.kernel.org/r/20171017143837.23207-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm/list_lru.c: mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation for enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Link: http://lkml.kernel.org/r/20171020190754.GA24332@embeddedor.com Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>