aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-06-07perf script python: Remove dups in documentation examplesSeongJae Park1-4/+2
Few shell command examples in perf-script-python.txt has few nitpicks include: - tools/perf/scripts/python directory listing command is unnecessarily repeated. - few examples contain additional information in command prompt unnecessarily and inconsistently. This commit fixes them to enhance readability of the document. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation") Link: http://lkml.kernel.org/r/20170530111827.21732-4-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-07perf script python: Updated trace_unhandled() signatureSeongJae Park1-6/+3
Default function signature of trace_unhandled() got changed to include a field dict, but its documentation, perf-script-python.txt has not been updated. Fix it. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pierre Tardy <tardyp@gmail.com> Fixes: c02514850d67 ("perf scripts python: Give field dict to unhandled callback") Link: http://lkml.kernel.org/r/20170530111827.21732-6-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-07perf script python: Fix wrong code snippets in documentationSeongJae Park1-2/+2
This commit fixes wrong code snippets for trace_begin() and trace_end() function example definition. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation") Link: http://lkml.kernel.org/r/20170530111827.21732-5-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-07perf script: Fix documentation errorsSeongJae Park2-3/+3
This commit fixes two errors in documents for perf-script-python and perf-script-perl as below: - /sys/kernel/debug/tracing events -> /sys/kernel/debug/tracing/events/ - trace_handled -> trace_unhandled Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation") Link: http://lkml.kernel.org/r/20170530111827.21732-3-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-07perf script: Fix outdated comment for perf-trace-pythonSeongJae Park1-1/+1
Script generated by the '--gen-script' option contains an outdated comment. It mentions a 'perf-trace-python' document while it has been renamed to 'perf-script-python'. Fix it. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 133dc4c39c57 ("perf: Rename 'perf trace' to 'perf script'") Link: http://lkml.kernel.org/r/20170530111827.21732-2-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-07perf probe: Fix examples section of documentationSeongJae Park1-2/+6
An example in perf-probe documentation for pattern of function name based probe addition is not providing example command for that case. This commit fixes the example to give appropriate example command. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Fixes: ee391de876ae ("perf probe: Update perf probe document") Link: http://lkml.kernel.org/r/20170507103642.30560-1-sj38.park@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf report: Ensure the perf DSO mapping matches what libdw seesMilian Wolff1-0/+8
In some situations the libdw unwinder stopped working properly. I.e. with libunwind we see: ~~~~~ heaptrack_gui 2228 135073.400112: 641314 cycles: e8ed _dl_fixup (/usr/lib/ld-2.25.so) 15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so) ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) 608f3 _GLOBAL__sub_I_kdynamicjobtracker.cpp (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) f199 call_init.part.0 (/usr/lib/ld-2.25.so) f2a5 _dl_init (/usr/lib/ld-2.25.so) db9 _dl_start_user (/usr/lib/ld-2.25.so) ~~~~~ But with libdw and without this patch this sample is not properly unwound: ~~~~~ heaptrack_gui 2228 135073.400112: 641314 cycles: e8ed _dl_fixup (/usr/lib/ld-2.25.so) 15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so) ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) ~~~~~ Debug output showed me that libdw found a module for the last frame address, but it thinks it belongs to /usr/lib/ld-2.25.so. This patch double-checks what libdw sees and what perf knows. If the mappings mismatch, we now report the elf known to perf. This fixes the situation above, and the libdw unwinder produces the same stack as libunwind. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20170602143753.16907-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf report: Include partial stacks unwound with libdwMilian Wolff1-1/+1
So far the whole stack was thrown away when any error occurred before the maximum stack depth was unwound. This is actually a very common scenario though. The stacks that got unwound so far are still interesting. This removes a large chunk of differences when comparing perf script output for libunwind and libdw perf unwinding. E.g. with libunwind: ~~~~~ heaptrack_gui 2228 135073.388524: 479408 cycles: ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms]) ffffffff81181662 perf_event_mmap ([kernel.kallsyms]) ffffffff811cf5ed mmap_region ([kernel.kallsyms]) ffffffff811cfe6b do_mmap ([kernel.kallsyms]) ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms]) ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms]) ffffffff81033acb sys_mmap ([kernel.kallsyms]) ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms]) 192ca mmap64 (/usr/lib/ld-2.25.so) 59a9 _dl_map_object_from_fd (/usr/lib/ld-2.25.so) 83d0 _dl_map_object (/usr/lib/ld-2.25.so) cda1 openaux (/usr/lib/ld-2.25.so) 1834f _dl_catch_error (/usr/lib/ld-2.25.so) cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so) 3481 dl_main (/usr/lib/ld-2.25.so) 17387 _dl_sysdep_start (/usr/lib/ld-2.25.so) 4d37 _dl_start (/usr/lib/ld-2.25.so) d87 _start (/usr/lib/ld-2.25.so) heaptrack_gui 2228 135073.388677: 611329 cycles: 1a3e0 strcmp (/usr/lib/ld-2.25.so) 82b2 _dl_map_object (/usr/lib/ld-2.25.so) cda1 openaux (/usr/lib/ld-2.25.so) 1834f _dl_catch_error (/usr/lib/ld-2.25.so) cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so) 3481 dl_main (/usr/lib/ld-2.25.so) 17387 _dl_sysdep_start (/usr/lib/ld-2.25.so) 4d37 _dl_start (/usr/lib/ld-2.25.so) d87 _start (/usr/lib/ld-2.25.so) ~~~~~ With libdw without this patch: ~~~~~ heaptrack_gui 2228 135073.388524: 479408 cycles: ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms]) ffffffff81181662 perf_event_mmap ([kernel.kallsyms]) ffffffff811cf5ed mmap_region ([kernel.kallsyms]) ffffffff811cfe6b do_mmap ([kernel.kallsyms]) ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms]) ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms]) ffffffff81033acb sys_mmap ([kernel.kallsyms]) ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms]) heaptrack_gui 2228 135073.388677: 611329 cycles: ~~~~~ With this patch applied, the libdw unwinder will produce the same output as the libunwind unwinder. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20170601210021.20046-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf annotate: Add missing powerpc tripletKim Phillips1-0/+1
On an Ubuntu xenial system, 'perf annotate' says to install powerpc objdump on a system that already has binutils-powerpc-linux-gnu installed. Make perf aware of the missing triplet for the powerpc-linux-gnu target. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20170529142754.7fbfb1152fd8f2663de0ea70@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf test: Disable breakpoint signal tests for powerpcJiri Olsa3-0/+24
The following tests are failing on powerpc: # perf test break 18: Breakpoint overflow signal handler : FAILED! 19: Breakpoint overflow sampling : FAILED! The powerpc kenel so far does not have support to even create instruction breakpoints using the perf event interface, so those tests fail early in the config phase. I added a '->is_supported()' callback to test struct to be able to disable specific tests. It seems better than putting ifdefs directly to the test array. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170601205450.GA398@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf symbols: Use correct filename for compressed modules in build-id cacheNamhyung Kim1-4/+1
The decompress_kmodule() decompresses kernel modules in order to load symbols from it. In the DSO_BINARY_TYPE__BUILD_ID_CACHE case, it needs the full file path to extract the file extension to determine the decompression method. But overwriting 'name' will fail the decompression since it might point to a non-existing old file. Instead, use dso->long_name for having the correct extension and use the real filename to decompress. In the DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP case, both names should be the same. This allows resolving symbols in the old modules. Before: $ perf report -i perf.data.old | grep scsi_mod 0.00% cc1 [scsi_mod] [k] 0x0000000000004aa6 0.00% as [scsi_mod] [k] 0x00000000000099e1 0.00% cc1 [scsi_mod] [k] 0x0000000000009830 0.00% cc1 [scsi_mod] [k] 0x0000000000001b8f After: 0.00% cc1 [scsi_mod] [k] scsi_handle_queue_ramp_up 0.00% as [scsi_mod] [k] scsi_sg_alloc 0.00% cc1 [scsi_mod] [k] scsi_setup_cmnd 0.00% cc1 [scsi_mod] [k] scsi_get_command Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170531120105.21731-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf symbols: Set module info when build-id event foundNamhyung Kim4-11/+20
Like machine__findnew_module_dso(), it should set necessary info for kernel modules to find symbol info from the file. Factor out dso__set_module_info() to do it. This is needed for dso__needs_decompress() to detect such DSOs. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170531120105.21731-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05perf header: Set proper module name when build-id event foundNamhyung Kim1-2/+10
When perf processes build-id event, it creates DSOs with the build-id. But it didn't set the module short name (like '[module-name]') so when processing a kernel mmap event of the module, it cannot found the DSO as it only checks the short names. That leads for perf to create a same DSO without the build-id info and it'll lookup the system path even if the DSO is already in the build-id cache. After kernel was updated, perf cannot find the DSO and cannot show symbols in it anymore. You can see this if you have an old data file (w/ old kernel version): $ perf report -i perf.data.old -v |& grep scsi_mod build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz : cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 Failed to open /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz, continuing without symbols ... The second message didn't show the build-id. With this patch: $ perf report -i perf.data.old -v |& grep scsi_mod build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz: cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz with build id cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 not found, continuing without symbols ... Now it shows the build-id but still cannot load the symbol table. This is a different problem which will be fixed in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170531120105.21731-1-namhyung@kernel.org [ Fix the build on older compilers (debian <= 8, fedora <= 21, etc) wrt kmod_path var init ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-05ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-MArd Biesheuvel2-10/+11
As reported by Patrice, the header layout of the decompressor is incorrect when building for v7-M. In this case, the __nop macro resolves to 'mov r0, r0', which is emitted as a narrow encoding, resulting in the header data fields to end up at lower offsets than required. Given the variety of targets we need to support with the same code, the startup sequence is a bit of a jumble, and uses instructions and macros whose encoding widths cannot be specified (badr), or only exist in a narrow encoding (bx) So force the use of a wide encoding in __nop, and replace the start sequence with a simple jump to the label marking the start of code, preceded by a Thumb2 mode switch if required (using explicit wide encodings where appropriate). The label itself can be moved to the start of code [where it belongs] due to the larger range of branch instructions as compared to adr instructions. Reported-by: Patrice CHOTARD <patrice.chotard@st.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-05ARM: 8676/1: NOMMU: provide pgprot_device() macroVladimir Murzin1-0/+1
NOMMU build leads to the following error: CC drivers/pci/mmap.o drivers/pci/mmap.c: In function 'pci_mmap_resource_range': drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration] vma->vm_page_prot = pgprot_device(vma->vm_page_prot); ^ cc1: some warnings being treated as errors scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed make[2]: *** [drivers/pci/mmap.o] Error 1 scripts/Makefile.build:561: recipe for target 'drivers/pci' failed make[1]: *** [drivers/pci] Error 2 Makefile:1016: recipe for target 'drivers' failed make: *** [drivers] Error 2 Fix it with support of pgprot_device() macro for NOMMU. Fixes: 00d2904ffeac ("ARM/PCI: Use generic pci_mmap_resource_range()") Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-04Linux 4.12-rc4Linus Torvalds1-1/+1
2017-06-04fs/ufs: Set UFS default maximum bytes per fileRichard Narron1-3/+2
This fixes a problem with reading files larger than 2GB from a UFS-2 file system: https://bugzilla.kernel.org/show_bug.cgi?id=195721 The incorrect UFS s_maxsize limit became a problem as of commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") which started using s_maxbytes to avoid a page index overflow in do_generic_file_read(). That caused files to be truncated on UFS-2 file systems because the default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it. Here I simply increase the default to a common value used by other file systems. Signed-off-by: Richard Narron <comet.berkeley@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will B <will.brokenbourgh2877@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45e2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-04Revert "tty: fix port buffer locking"Greg Kroah-Hartman1-2/+0
This reverts commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e. It causes lots of warnings and problems so for now, let's just revert it. Reported-by: <valdis.kletnieks@vt.edu> Reported-by: Russell King <linux@armlinux.org.uk> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-03nfs: Mark unnecessarily extern functions as staticJan Kara2-4/+3
nfs_initialise_sb() and nfs_clone_super() are declared as extern even though they are used only in fs/nfs/super.c. Mark them as static. Also remove explicit 'inline' directive from nfs_initialise_sb() and leave it upto compiler to decide whether inlining is worth it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-03hwmon: (aspeed-pwm-tacho) make fan/pwm names start with index 1Stefan Schaeckeler1-26/+26
Make fan and pwm names in sysfs start with index 1 in accordance to Documentation/hwmon/sysfs-interface conventions. Current implementation starts with index 0, making tools such as sensors(1) skip the first fan. Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com> Fixes: 2d7a548a3eff ("drivers: hwmon: Support for ASPEED PWM/Fan tach") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-03hwmon: (aspeed-pwm-tacho) Call of_node_put() on a node not claimedStefan Schaeckeler1-1/+0
Call of_node_put() on a node claimed with of_node_get() or by any other means such as for_each_child_of_node(). Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com> Fixes: 2d7a548a3eff ("drivers: hwmon: Support for ASPEED PWM/Fan tach") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-02Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 tooHans de Goede1-2/+3
acpi_dev_found checks that there is a matching ACPI node, but it may be disabled (_STA method returns 0) in which case the soc_button_array driver will not bind to it and axp20x-pek should handle the power-button. This commit switches from acpi_dev_found to acpi_dev_present to avoid not registering an input-dev for the powerbutton when there is a disabled PNP0C40 device. The ACPI-6.0 standard defines a standard gpio button device using the ACPI0011 HID replacing the custom PNP0C40 gpio device, many newer devices define both PNP0C40 and ACPI0011 devices enabling one or the other depending on whether the BIOS thinks it is going to boot Android or Windows. This commit adds a check for the ACPI0011 device, so that if either device is present *and* enabled we don't register an input-dev for the powerbutton. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-02Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry TrailHans de Goede1-7/+36
Commit 9b13a4ca8d2c ("Input: axp20x-pek - do not register input device on some systems") added a check for the INTCFD9 ACPI device which also handles the powerbutton as on some systems the powerbutton is connected to both the PMIC, handled by axp20x-pek, and to a gpio on the SoC, handled by soc_button_array which attaches itself to the INTCFD9 ACPI device. Testing + comparing DSDTs has shown that this only happens on Cherry Trail devices with an AXP288 PMIC, the AXP288 PMIC is also used on Bay Trail devices but there the power button is only connected to the PMIC and not handled by soc_button_array. This means that the INTCFD9 check has caused a regression on Bay Trail devices, causing power-button presses to no longer be seen. This commit fixes this by limiting the check to devices where the ACPI node for the AXP288 contains a _HRV (hardware revision) attribute with a value of 3 which indicates we are dealing with a Cherry Trail platform. Fixes: 9b13a4ca8d2c ("Input: axp20x-pek - do not register input ...") Reported-by: Сергей Трусов <t.rus76@ya.ru> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-02scripts/gdb: make lx-dmesg command work (reliably)André Draszik1-4/+5
lx-dmesg needs access to the log_buf symbol from printk.c. Unfortunately, the symbol log_buf also exists in BPF's verifier.c and hence gdb can pick one or the other. If it happens to pick BPF's log_buf, lx-dmesg doesn't work: (gdb) lx-dmesg Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x0: Error occurred in Python command: Cannot access memory at address 0x0 (gdb) p log_buf $15 = 0x0 Luckily, GDB has a way to deal with this, see https://sourceware.org/gdb/onlinedocs/gdb/Symbols.html (gdb) info variables ^log_buf$ All variables matching regular expression "^log_buf$": File <linux.git>/kernel/bpf/verifier.c: static char *log_buf; File <linux.git>/kernel/printk/printk.c: static char *log_buf; (gdb) p 'verifier.c'::log_buf $1 = 0x0 (gdb) p 'printk.c'::log_buf $2 = 0x811a6aa0 <__log_buf> "" (gdb) p &log_buf $3 = (char **) 0x8120fe40 <log_buf> (gdb) p &'verifier.c'::log_buf $4 = (char **) 0x8120fe40 <log_buf> (gdb) p &'printk.c'::log_buf $5 = (char **) 0x8048b7d0 <log_buf> By being explicit about the location of the symbol, we can make lx-dmesg work again. While at it, do the same for the other symbols we need from printk.c Link: http://lkml.kernel.org/r/20170526112222.3414-1-git@andred.net Signed-off-by: André Draszik <git@andred.net> Tested-by: Kieran Bingham <kieran@bingham.xyz> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm: consider memblock reservations for deferred memory initialization sizingMichal Hocko4-11/+54
We have seen an early OOM killer invocation on ppc64 systems with crashkernel=4096M: kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0 kthreadd cpuset=/ mems_allowed=7 CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1 Call Trace: dump_stack+0xb0/0xf0 (unreliable) dump_header+0xb0/0x258 out_of_memory+0x5f0/0x640 __alloc_pages_nodemask+0xa8c/0xc80 kmem_getpages+0x84/0x1a0 fallback_alloc+0x2a4/0x320 kmem_cache_alloc_node+0xc0/0x2e0 copy_process.isra.25+0x260/0x1b30 _do_fork+0x94/0x470 kernel_thread+0x48/0x60 kthreadd+0x264/0x330 ret_from_kernel_thread+0x5c/0xa4 Mem-Info: active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:5 slab_unreclaimable:73 mapped:0 shmem:0 pagetables:0 bounce:0 free:0 free_pcp:0 free_cma:0 Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB 0 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 819200 pages RAM 0 pages HighMem/MovableOnly 817481 pages reserved 0 pages cma reserved 0 pages hwpoisoned the reason is that the managed memory is too low (only 110MB) while the rest of the the 50GB is still waiting for the deferred intialization to be done. update_defer_init estimates the initial memoty to initialize to 2GB at least but it doesn't consider any memory allocated in that range. In this particular case we've had Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB) so the low 2GB is mostly depleted. Fix this by considering memblock allocations in the initial static initialization estimation. Move the max_initialise to reset_deferred_meminit and implement a simple memblock_reserved_memory helper which iterates all reserved blocks and sums the size of all that start below the given address. The cumulative size is than added on top of the initial estimation. This is still not ideal because reset_deferred_meminit doesn't consider holes and so reservation might be above the initial estimation whihch we ignore but let's make the logic simpler until we really need to handle more complicated cases. Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specifiedJames Morse3-12/+24
KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a special case. (check_user_page_hwpoison()) When huge pages are involved, this doesn't work so well. get_user_pages() calls follow_hugetlb_page(), which stops early if it receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning -EFAULT to the caller. The step to map this to -EHWPOISON based on the FOLL_ flags is missing. The hwpoison special case is skipped, and -EFAULT is returned to user-space, causing Qemu or kvmtool to exit. Instead, move this VM_FAULT_ to errno mapping code into a header file and use it from faultin_page() and follow_hugetlb_page(). With this, KVM works as expected. This isn't a problem for arm64 today as we haven't enabled MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86 too, so I think this should be a fix. This doesn't apply earlier than stable's v4.11.1 due to all sorts of cleanup. [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()] suggested. Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.11.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mlock: fix mlock count can not decrease in race conditionYisheng Xie1-2/+3
Kefeng reported that when running the follow test, the mlock count in meminfo will increase permanently: [1] testcase linux:~ # cat test_mlockal grep Mlocked /proc/meminfo for j in `seq 0 10` do for i in `seq 4 15` do ./p_mlockall >> log & done sleep 0.2 done # wait some time to let mlock counter decrease and 5s may not enough sleep 5 grep Mlocked /proc/meminfo linux:~ # cat p_mlockall.c #include <sys/mman.h> #include <stdlib.h> #include <stdio.h> #define SPACE_LEN 4096 int main(int argc, char ** argv) { int ret; void *adr = malloc(SPACE_LEN); if (!adr) return -1; ret = mlockall(MCL_CURRENT | MCL_FUTURE); printf("mlcokall ret = %d\n", ret); ret = munlockall(); printf("munlcokall ret = %d\n", ret); free(adr); return 0; } In __munlock_pagevec() we should decrement NR_MLOCK for each page where we clear the PageMlocked flag. Commit 1ebb7cc6a583 ("mm: munlock: batch NR_MLOCK zone state updates") has introduced a bug where we don't decrement NR_MLOCK for pages where we clear the flag, but fail to isolate them from the lru list (e.g. when the pages are on some other cpu's percpu pagevec). Since PageMlocked stays cleared, the NR_MLOCK accounting gets permanently disrupted by this. Fix it by counting the number of page whose PageMlock flag is cleared. Fixes: 1ebb7cc6a583 (" mm: munlock: batch NR_MLOCK zone state updates") Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joern Engel <joern@logfs.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michel Lespinasse <walken@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: zhongjiang <zhongjiang@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm/migrate: fix refcount handling when !hugepage_migration_supported()Punit Agrawal1-6/+2
On failing to migrate a page, soft_offline_huge_page() performs the necessary update to the hugepage ref-count. But when !hugepage_migration_supported() , unmap_and_move_hugepage() also decrements the page ref-count for the hugepage. The combined behaviour leaves the ref-count in an inconsistent state. This leads to soft lockups when running the overcommitted hugepage test from mce-tests suite. Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000 soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head) INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715 (detected by 7, t=5254 jiffies, g=963, c=962, q=321) thugetlb_overco R running task 0 2715 2685 0x00000008 Call trace: dump_backtrace+0x0/0x268 show_stack+0x24/0x30 sched_show_task+0x134/0x180 rcu_print_detail_task_stall_rnp+0x54/0x7c rcu_check_callbacks+0xa74/0xb08 update_process_times+0x34/0x60 tick_sched_handle.isra.7+0x38/0x70 tick_sched_timer+0x4c/0x98 __hrtimer_run_queues+0xc0/0x300 hrtimer_interrupt+0xac/0x228 arch_timer_handler_phys+0x3c/0x50 handle_percpu_devid_irq+0x8c/0x290 generic_handle_irq+0x34/0x50 __handle_domain_irq+0x68/0xc0 gic_handle_irq+0x5c/0xb0 Address this by changing the putback_active_hugepage() in soft_offline_huge_page() to putback_movable_pages(). This only triggers on systems that enable memory failure handling (ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration (!ARCH_ENABLE_HUGEPAGE_MIGRATION). I imagine this wasn't triggered as there aren't many systems running this configuration. [akpm@linux-foundation.org: remove dead comment, per Naoya] Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com Reported-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [3.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02dax: fix race between colliding PMD & PTE entriesRoss Zwisler1-0/+23
We currently have two related PMD vs PTE races in the DAX code. These can both be easily triggered by having two threads reading and writing simultaneously to the same private mapping, with the key being that private mapping reads can be handled with PMDs but private mapping writes are always handled with PTEs so that we can COW. Here is the first race: CPU 0 CPU 1 (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK handle_pte_fault() passes check for pmd_devmap() (private mapping read) __handle_mm_fault() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD installed in our page tables at this spot. Here's the second race: CPU 0 CPU 1 (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() handle_pte_fault() dax_iomap_pte_fault() inserts PTE dax_iomap_pmd_fault() inserts PMD, but we already have a PTE at this spot. The core of the issue is that while there is isolation between faults to the same range in the DAX fault handlers via our DAX entry locking, there is no isolation between faults in the code in mm/memory.c. This means for instance that this code in __handle_mm_fault() can run: if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); But by the time we actually get to run the fault handler called by create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE fault has installed a normal PMD here as a parent. This is the cause of the 2nd race. The first race is similar - there is the following check in handle_pte_fault(): } else { /* See comment in pte_alloc_one_map() */ if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) return 0; So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we will bail and retry the fault. This is correct, but there is nothing preventing the PMD from being installed after this check but before we actually get to the DAX PTE fault handlers. In my testing these races result in the following types of errors: BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1 BUG: non-zero nr_ptes on freeing mm: 15 Fix this issue by having the DAX fault handlers verify that it is safe to continue their fault after they have taken an entry lock to block other racing faults. [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries] Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Pawel Lebioda <pawel.lebioda@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Eryu Guan <eguan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm: avoid spurious 'bad pmd' warning messagesRoss Zwisler1-10/+30
When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge pages, they were all added to the end of if() statements after existing pmd_trans_huge() checks. So, things like: - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) When further checks were added after pmd_trans_unstable() checks by commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") they were also added at the end of the conditional: + if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) This ordering is fine for pmd_trans_huge(), but doesn't work for pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() check inside of pmd_none_or_trans_huge_or_clear_bad() (called by pmd_trans_unstable()), which prints out a warning and returns 1. So, we do end up doing the right thing, but only after spamming dmesg with suspicious looking messages: mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5) Reorder these checks in a helper so that pmd_devmap() is checked first, avoiding the error messages, and add a comment explaining why the ordering is important. Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Eryu Guan <eguan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks onceTetsuo Handa1-1/+3
Roman Gushchin has reported that the OOM killer can trivially selects next OOM victim when a thread doing memory allocation from page fault path was selected as first OOM victim. allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0 allocate cpuset=/ mems_allowed=0 CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: oom_kill_process+0x219/0x3e0 out_of_memory+0x11d/0x480 __alloc_pages_slowpath+0xc84/0xd40 __alloc_pages_nodemask+0x245/0x260 alloc_pages_vma+0xa2/0x270 __handle_mm_fault+0xca9/0x10c0 handle_mm_fault+0xf3/0x210 __do_page_fault+0x240/0x4e0 trace_do_page_fault+0x37/0xe0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 ... Out of memory: Kill process 492 (allocate) score 899 or sacrifice child Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null) allocate cpuset=/ mems_allowed=0 CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: __alloc_pages_slowpath+0xd32/0xd40 __alloc_pages_nodemask+0x245/0x260 alloc_pages_vma+0xa2/0x270 __handle_mm_fault+0xca9/0x10c0 handle_mm_fault+0xf3/0x210 __do_page_fault+0x240/0x4e0 trace_do_page_fault+0x37/0xe0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 ... oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0 allocate cpuset=/ mems_allowed=0 CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: oom_kill_process+0x219/0x3e0 out_of_memory+0x11d/0x480 pagefault_out_of_memory+0x68/0x80 mm_fault_error+0x8f/0x190 ? handle_mm_fault+0xf3/0x210 __do_page_fault+0x4b2/0x4e0 trace_do_page_fault+0x37/0xe0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 ... Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB There is a race window that the OOM reaper completes reclaiming the first victim's memory while nothing but mutex_trylock() prevents the first victim from calling out_of_memory() from pagefault_out_of_memory() after memory allocation for page fault path failed due to being selected as an OOM victim. This is a side effect of commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath") because that commit silently changed the behavior from /* Avoid allocations with no watermarks from looping endlessly */ to /* * Give up allocations without trying memory reserves if selected * as an OOM victim */ in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE flag. I have noticed this change but I didn't post a patch because I thought it is an acceptable change other than noise by warn_alloc() because !__GFP_NOFAIL allocations are allowed to fail. But we overlooked that failing memory allocation from page fault path makes difference due to the race window explained above. While it might be possible to add a check to pagefault_out_of_memory() that prevents the first victim from calling out_of_memory() or remove out_of_memory() from pagefault_out_of_memory(), changing pagefault_out_of_memory() does not suppress noise by warn_alloc() when allocating thread was selected as an OOM victim. There is little point with printing similar backtraces and memory information from both out_of_memory() and warn_alloc(). Instead, if we guarantee that current thread can try allocations with no watermarks once when current thread looping inside __alloc_pages_slowpath() was selected as an OOM victim, we can follow "who can use memory reserves" rules and suppress noise by warn_alloc() and prevent memory allocations from page fault path from calling pagefault_out_of_memory(). If we take the comment literally, this patch would do - if (test_thread_flag(TIF_MEMDIE)) - goto nopage; + if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC)) + goto nopage; because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is given. But if I recall correctly (I couldn't find the message), the condition is meant to apply to only OOM victims despite the comment. Therefore, this patch preserves TIF_MEMDIE check. Fixes: 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath") Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Roman Gushchin <guro@fb.com> Tested-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [4.11] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02pcmcia: remove left-over %Z formatNicolas Iooss1-3/+3
Commit 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") removed some usages of format %Z but forgot "%.2Zx". This makes clang 4.0 reports a -Wformat-extra-args warning because it does not know about %Z. Replace %Z with %z. Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Harald Welte <laforge@gnumonks.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02slub/memcg: cure the brainless abuse of sysfs attributesThomas Gleixner1-2/+4
memcg_propagate_slab_attrs() abuses the sysfs attribute file functions to propagate settings from the root kmem_cache to a newly created kmem_cache. It does that with: attr->show(root, buf); attr->store(new, buf, strlen(bug); Aside of being a lazy and absurd hackery this is broken because it does not check the return value of the show() function. Some of the show() functions return 0 w/o touching the buffer. That means in such a case the store function is called with the stale content of the previous show(). That causes nonsense like invoking kmem_cache_shrink() on a newly created kmem_cache. In the worst case it would cause handing in an uninitialized buffer. This should be rewritten proper by adding a propagate() callback to those slub_attributes which must be propagated and avoid that insane conversion to and from ASCII, but that's too large for a hot fix. Check at least the return value of the show() function, so calling store() with stale content is prevented. Steven said: "It can cause a deadlock with get_online_cpus() that has been uncovered by recent cpu hotplug and lockdep changes that Thomas and Peter have been doing. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(slab_mutex); lock(cpu_hotplug.lock); lock(slab_mutex); *** DEADLOCK ***" Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02initramfs: fix disabling of initramfs (and its compression)Florian Fainelli1-0/+1
Commit db2aa7fd15e8 ("initramfs: allow again choice of the embedded initram compression algorithm") introduced the possibility to select the initramfs compression algorithm from Kconfig and while this is a nice feature it broke the use case described below. Here is what my build system does: - kernel is initially configured not to have an initramfs included - build the user space root file system - re-configure the kernel to have an initramfs included (CONFIG_INITRAMFS_SOURCE="/path/to/romfs") and set relevant CONFIG_INITRAMFS options, in my case, no compression option (CONFIG_INITRAMFS_COMPRESSION_NONE) - kernel is re-built with these options -> kernel+initramfs image is copied - kernel is re-built again without these options -> kernel image is copied Building a kernel without an initramfs means setting this option: CONFIG_INITRAMFS_SOURCE="" (and this one only) whereas building a kernel with an initramfs means setting these options: CONFIG_INITRAMFS_SOURCE="/home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev" CONFIG_INITRAMFS_ROOT_UID=1000 CONFIG_INITRAMFS_ROOT_GID=1000 CONFIG_INITRAMFS_COMPRESSION_NONE=y CONFIG_INITRAMFS_COMPRESSION="" Commit db2aa7fd15e85 ("initramfs: allow again choice of the embedded initram compression algorithm") is problematic because CONFIG_INITRAMFS_COMPRESSION which is used to determine the initramfs_data.cpio extension/compression is a string, and due to how Kconfig works it will evaluate in order, how to assign it. Setting CONFIG_INITRAMFS_COMPRESSION_NONE with CONFIG_INITRAMFS_SOURCE="" cannot possibly work (because of the depends on INITRAMFS_SOURCE!="" imposed on CONFIG_INITRAMFS_COMPRESSION ) yet we still get CONFIG_INITRAMFS_COMPRESSION assigned to ".gz" because CONFIG_RD_GZIP=y is set in my kernel, even when there is no initramfs being built. So we basically end-up generating two initramfs_data.cpio* files, one without extension, and one with .gz. This causes usr/Makefile to track usr/initramfs_data.cpio.gz, and not usr/initramfs_data.cpio anymore, that is also largely problematic after 9e3596b0c6539e ("kbuild: initramfs cleanup, set target from Kconfig") because we used to track all possible initramfs_data files in the $(targets) variable before that commit. The end result is that the kernel with an initramfs clearly does not contain what we expect it to, it has a stale initramfs_data.cpio file built into it, and we keep re-generating an initramfs_data.cpio.gz file which is not the one that we want to include in the kernel image proper. The fix consists in hiding CONFIG_INITRAMFS_COMPRESSION when CONFIG_INITRAMFS_SOURCE="". This puts us back in a state to the pre-4.10 behavior where we can properly disable and re-enable initramfs within the same kernel .config file, and be in control of what CONFIG_INITRAMFS_COMPRESSION is set to. Fixes: db2aa7fd15e8 ("initramfs: allow again choice of the embedded initram compression algorithm") Fixes: 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig") Link: http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Cc: P J P <ppandit@redhat.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Michal Marek <mmarek@suse.cz> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02mm: clarify why we want kmalloc before falling backto vmallockMichal Hocko1-2/+5
While converting drm_[cm]alloc* helpers to kvmalloc* variants Chris Wilson has wondered why we want to try kmalloc before vmalloc fallback even for larger allocations requests. Let's clarify that one larger physically contiguous block is less likely to fragment memory than many scattered pages which can prevent more large blocks from being created. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170517080932.21423-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02frv: declare jiffies to be located in the .data sectionMatthias Kaehlcke2-1/+11
Commit 7c30f352c852 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp") removed a section specification from the jiffies declaration that caused conflicts on some platforms. Unfortunately this change broke the build for frv: kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 ... Add __jiffy_arch_data to the declaration of jiffies and use it on frv to include the section specification. For all other platforms __jiffy_arch_data (currently) has no effect. Fixes: 7c30f352c852 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp") Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02include/linux/gfp.h: fix ___GFP_NOLOCKDEP valueMichal Hocko1-1/+1
Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit. At the time commit 7e7844226f10 ("lockdep: allow to disable reclaim lockup detection") was written we still had __GFP_OTHER_NODE but I have removed it in commit 41b6167e8f74 ("mm: get rid of __GFP_OTHER_NODE") and forgot to lower the bit value. The current value is outside of __GFP_BITS_SHIFT so it cannot be used actually. Fixes: 7e7844226f10 ("lockdep: allow to disable reclaim lockup detection") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Igor Stoppa <igor.stoppa@nokia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02ksm: prevent crash after write_protect_page failsAndrea Arcangeli1-2/+1
"err" needs to be left set to -EFAULT if split_huge_page succeeds. Otherwise if "err" gets clobbered with zero and write_protect_page fails, try_to_merge_one_page() will succeed instead of returning -EFAULT and then try_to_merge_with_ksm_page() will continue thinking kpage is a PageKsm when in fact it's still an anonymous page. Eventually it'll crash in page_add_anon_rmap. This has been reproduced on Fedora25 kernel but I can reproduce with upstream too. The bug was introduced in commit f765f540598a ("ksm: prepare to new THP semantics") introduced in v4.5. page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673 flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:ffffa09674bf0000 ------------[ cut here ]------------ kernel BUG at mm/rmap.c:1222! CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1 RIP: do_page_add_anon_rmap+0x1c4/0x240 Call Trace: page_add_anon_rmap+0x18/0x20 try_to_merge_with_ksm_page+0x50b/0x780 ksm_scan_thread+0x1211/0x1410 ? prepare_to_wait_event+0x100/0x100 ? try_to_merge_with_ksm_page+0x780/0x780 kthread+0xd9/0xf0 ? kthread_park+0x60/0x60 ret_from_fork+0x25/0x30 Fixes: f765f54059 ("ksm: prepare to new THP semantics") Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Federico Simoncelli <fsimonce@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02perf stat: Only print NMI watchdog hint when enabledAndi Kleen1-1/+4
Only print the NMI watchdog hint when that watchdog it actually enabled. This avoids printing these unnecessarily. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857wz1i@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-02ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementationLorenzo Pieralisi1-3/+3
The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes muster from an ACPI specification standpoint. Current macro detects the MADT GICC entry length through ACPI firmware version (it changed from 76 to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification) but always uses (erroneously) the ACPICA (latest) struct (ie struct acpi_madt_generic_interrupt - that is 80-bytes long) length to check if the current GICC entry memory record exceeds the MADT table end in memory as defined by the MADT table header itself, which may result in false negatives depending on the ACPI firmware version and how the MADT entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC entries are 76 bytes long, so by adding 80 to a GICC entry start address in memory the resulting address may well be past the actual MADT end, triggering a false negative). Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks and update them to always use the firmware version specific MADT GICC entry length in order to carry out boundary checks. Fixes: b6cfb277378e ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro") Reported-by: Julien Grall <julien.grall@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Al Stone <ahs3@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-06-02HID: asus: Stop underlying hardware on removeCarlo Caione1-0/+2
We are missing a call to hid_hw_stop() on the remove hook. Among other things this is causing an Oops when (re-)starting GNOME / upowerd / ... after the module has been already rmmod-ed. Signed-off-by: Carlo Caione <carlo@endlessm.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-06-02dmaengine: pl330: fix warning in pl330_removeJean-Philippe Brucker1-1/+2
When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for unallocated irqs. Similarly to pl330_probe, check that IRQ number is present before calling devm_free_irq. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-06-01Input: tm2-touchkey - use LEN_ON as boolean value instead of LED_FULLAndi Shyti1-1/+1
Commit 4e552c8cb5bc ("leds: add LED_ON brightness as boolean value") has introduced the LED_ON enumeration value that can be used instead of LED_FULL which has more of a linear value. Because the tm2-touchscreen doesn't have brightness levels, but it's a simple on/off led, use LED_ON instead of LED_FULL. Signed-off-by: Andi Shyti <andi.shyti@samsung.com> Reviewed-by: Jaechul Lee <jcsing.lee@samsung.com> Tested-by: Jaechul Lee <jcsing.lee@samsung.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-01RDMA/SA: Fix kernel panic in CMA request handler flowMajd Dibbiny5-35/+15
Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields) moved the service_id to be specific attribute for IB and OPA SA Path Record, and thus wasn't assigned for RoCE. This caused to the following kernel panic in the CMA request handler flow: [ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0 ... [ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm] [ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000 [ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0 ... [ 27.075979] Call Trace: [ 27.076015] radix_tree_lookup+0xd/0x10 [ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm] [ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm] [ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core] [ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm] [ 27.076237] cm_process_work+0x25/0x120 [ib_cm] [ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm] [ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm] [ 27.076430] ? sched_clock_cpu+0x11/0xb0 [ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm] [ 27.076525] process_one_work+0x160/0x410 [ 27.076565] worker_thread+0x137/0x4a0 [ 27.076614] kthread+0x112/0x150 [ 27.076684] ? max_active_store+0x60/0x60 [ 27.077642] ? kthread_park+0x90/0x90 [ 27.078530] ret_from_fork+0x2c/0x40 This patch moves it back to the common SA Path Record structure and removes the redundant setter and getter. Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively. Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands ROCE specific fields) Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/umem: Fix missing mmap_sem in get umem ODP callLeon Romanovsky1-1/+5
Add mmap_sem lock around VMA inspection in ib_umem_odp_get(). Fixes: 0008b84ea9af ('IB/umem: Add support to huge ODP') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/core: not to set page dirty bit if it's already set.Qing Huang1-1/+1
This change will optimize kernel memory deregistration operations. __ib_umem_release() used to call set_page_dirty_lock() against every writable page in its memory region. Its purpose is to keep data synced between CPU and DMA device when swapping happens after mem deregistration ops. Now we choose not to set page dirty bit if it's already set by kernel prior to calling __ib_umem_release(). This reduces memory deregistration time by half or even more when we ran application simulation test program. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/uverbs: Declare local function static and add brackets to sizeofLeon Romanovsky1-4/+4
Commit 57520751445b ("IB/SA: Add OPA path record type") introduced new local function __ib_copy_path_rec_to_user, but didn't limit its scope. This produces the following sparse warning: drivers/infiniband/core/uverbs_marshall.c:99:6: warning: symbol '__ib_copy_path_rec_to_user' was not declared. Should it be static? In addition, it used sizeof ... notations instead of sizeof(...), which is correct in C, but a little bit misleading. Let's change it too. Fixes: 57520751445b ("IB/SA: Add OPA path record type") Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/netlink: Reduce exposure of RDMA netlink functionsLeon Romanovsky3-11/+11
RDMA netlink is part of ib_core, hence ibnl_chk_listeners(), ibnl_init() and ibnl_cleanup() don't need to be published in public header file. Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these functions to private header file. CC: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/srp: Fix NULL deref at srp_destroy_qp()Israel Rukshin1-1/+1
If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq may be NULL. Calling directly to ib_destroy_qp() is sufficient because no work requests were posted on the created qp. Fixes: 9294000d6d89 ("IB/srp: Drain the send queue before destroying a QP") Cc: <stable@vger.kernel.org> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>-- Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01RDMA/IPoIB: Limit the ipoib_dev_uninit_default scopeLeon Romanovsky1-1/+1
ipoib_dev_uninit_default() call is used in ipoib_main.c file only and it generates the following warning from smatch tool: drivers/infiniband/ulp/ipoib/ipoib_main.c:1593:6: warning: symbol 'ipoib_dev_uninit_default' was not declared. Should it be static? so let's declare that function as static. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>