aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-11-17sysvipc: unteach ids->next_id for !CHECKPOINT_RESTOREDavidlohr Bueso1-0/+2
Patch series "sysvipc: ipc-key management improvements". Here are a few improvements I spotted while eyeballing Guillaume's rhashtable implementation for ipc keys. The first and fourth patches are the interesting ones, the middle two are trivial. This patch (of 4): The next_id object-allocation functionality was introduced in commit 03f595668017 ("ipc: add sysctl to specify desired next object id"). Given that these new entries are _only_ exported under the CONFIG_CHECKPOINT_RESTORE option, there is no point for the common case to even know about ->next_id. As such rewrite ipc_buildid() such that it can do away with the field as well as unnecessary branches when adding a new identifier. The end result also better differentiates both cases, so the code ends up being cleaner; albeit the small duplications regarding the default case. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170831172049.14576-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel/reboot.c: add devm_register_reboot_notifier()Andrey Smirnov1-0/+4
Add devm_* wrapper around register_reboot_notifier to simplify device specific reboot notifier registration/unregistration. [akpm@linux-foundation.org: move `struct device' forward decl to top-of-file] Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.com Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kcov: support comparison operands collectionVictor Chibotaru2-4/+32
Enables kcov to collect comparison operands from instrumented code. This is done by using Clang's -fsanitize=trace-cmp instrumentation (currently not available for GCC). The comparison operands help a lot in fuzz testing. E.g. they are used in Syzkaller to cover the interiors of conditional statements with way less attempts and thus make previously unreachable code reachable. To allow separate collection of coverage and comparison operands two different work modes are implemented. Mode selection is now done via a KCOV_ENABLE ioctl call with corresponding argument value. Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com Signed-off-by: Victor Chibotaru <tchibo@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel/panic.c: add TAINT_AUXBorislav Petkov1-1/+2
This is the gist of a patch which we've been forward-porting in our kernels for a long time now and it probably would make a good sense to have such TAINT_AUX flag upstream which can be used by each distro etc, how they see fit. This way, we won't need to forward-port a distro-only version indefinitely. Add an auxiliary taint flag to be used by distros and others. This obviates the need to forward-port whatever internal solutions people have in favor of a single flag which they can map arbitrarily to a definition of their pleasing. The "X" mnemonic could also mean eXternal, which would be taint from a distro or something else but not the upstream kernel. We will use it to mark modules for which we don't provide support. I.e., a really eXternal module. Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pid: remove pidhashGargi Sharma3-5/+2
pidhash is no longer required as all the information can be looked up from idr tree. nr_hashed represented the number of pids that had been hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant, it has been renamed to pid_allocated and PIDNS_ADDING respectively. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> [ia64] Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pid: replace pid bitmap implementation with IDR APIGargi Sharma1-11/+3
Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pipe: add proc_dopipe_max_size() to safely assign pipe_max_sizeJoe Lawrence2-0/+4
pipe_max_size is assigned directly via procfs sysctl: static struct ctl_table fs_table[] = { ... { .procname = "pipe-max-size", .data = &pipe_max_size, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, }, ... int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, size_t *lenp, loff_t *ppos) { ... ret = proc_dointvec_minmax(table, write, buf, lenp, ppos) ... and then later rounded in-place a few statements later: ... pipe_max_size = round_pipe_size(pipe_max_size); ... This leaves a window of time between initial assignment and rounding that may be visible to other threads. (For example, one thread sets a non-rounded value to pipe_max_size while another reads its value.) Similar reads of pipe_max_size are potentially racy: pipe.c :: alloc_pipe_info() pipe.c :: pipe_set_size() Add a new proc_dopipe_max_size() that consolidates reading the new value from the user buffer, verifying bounds, and calling round_pipe_size() with a single assignment to pipe_max_size. Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.com Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17lib/genalloc.c: make the avail variable an atomic_long_tStephen Bates1-1/+2
If the amount of resources allocated to a gen_pool exceeds 2^32 then the avail atomic overflows and this causes problems when clients try and borrow resources from the pool. This is only expected to be an issue on 64 bit systems. Add the <linux/atomic.h> header to pull in atomic_long* operations. So that 32 bit systems continue to use atomic32_t but 64 bit systems can use atomic64_t. Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com Signed-off-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Daniel Mentz <danielmentz@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>Masahiro Yamada1-1/+0
This include was added by commit 187f1882b5b0 ("BUG: headers with BUG/BUG_ON etc. need linux/bug.h") because BUG_ON() was used in this header at that time. Some time later, commit 6d75f366b924 ("lib: radix-tree: check accounting of existing slot replacement users") removed the use of BUG_ON() from this header. Since then, there is no reason to include <linux/bug.h>. Link: http://lkml.kernel.org/r/1505660151-4383-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Mi <chrism@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/bitfield.h: include <linux/build_bug.h> instead of <linux/bug.h>Masahiro Yamada1-1/+1
Since commit bc6245e5efd7 ("bug: split BUILD_BUG stuff out into <linux/build_bug.h>"), #include <linux/build_bug.h> is better to pull minimal headers needed for BUILG_BUG() family. Link: http://lkml.kernel.org/r/1505700775-19826-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Dinan Gunawardena <dinan.gunawardena@netronome.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17include/linux/compiler-clang.h: handle randomizable anonymous structsSandipan Das1-0/+3
The GCC randomize layout plugin can randomize the member offsets of sensitive kernel data structures. To use this feature, certain annotations and members are added to the structures which affect the member offsets even if this plugin is not used. All of these structures are completely randomized, except for task_struct which leaves out some of its members. All the other members are wrapped within an anonymous struct with the __randomize_layout attribute. This is done using the randomized_struct_fields_start and randomized_struct_fields_end defines. When the plugin is disabled, the behaviour of this attribute can vary based on the GCC version. For GCC 5.1+, this attribute maps to __designated_init otherwise it is just an empty define but the anonymous structure is still present. For other compilers, both randomized_struct_fields_start and randomized_struct_fields_end default to empty defines meaning the anonymous structure is not introduced at all. So, if a module compiled with Clang, such as a BPF program, needs to access task_struct fields such as pid and comm, the offsets of these members as recognized by Clang are different from those recognized by modules compiled with GCC. If GCC 4.6+ is used to build the kernel, this can be solved by introducing appropriate defines for Clang so that the anonymous structure is seen when determining the offsets for the members. Link: http://lkml.kernel.org/r/20171109064645.25581-1-sandipan@linux.vnet.ibm.com Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: fix "cut here" location for __WARN_TAINT architecturesKees Cook1-2/+3
Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s. After WARN() was moved to using UD0 on x86, the warning text started appearing _before_ the "cut here" line. This appears to have been a long-standing bug on architectures that used __WARN_TAINT, but it didn't get fixed. v4.11 and earlier on x86: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30 This is a warning message Modules linked in: v4.12 and later on x86: This is a warning message ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20 Modules linked in: With this fix: ------------[ cut here ]------------ This is a warning message WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20 Since the __FILE__ reporting happens as part of the UD0 handler, it isn't trivial to move the message to after the WARNING line, but at least we can fix the position of the "cut here" line so all the various logging tools will start including the actual runtime warning message again, when they follow the instruction and "cut here". Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: define the "cut here" string in a single placeKees Cook1-0/+2
The "cut here" string is used in a few paths. Define it in a single place. Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17iopoll: avoid -Wint-in-bool-context warningArnd Bergmann1-9/+15
When we pass the result of a multiplication as the timeout or the delay, we can get a warning from gcc-7: drivers/mmc/host/bcm2835.c:596:149: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/mfd/arizona-core.c:247:195: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c:49:27: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] The warning is a bit questionable inside of a macro, but this is intentional on the side of the gcc developers. It is also an indication of another problem: we evaluate the timeout and sleep arguments multiple times, which can have undesired side-effects when those are complex expressions. This changes the two iopoll variants to use local variables for storing copies of the timeouts. This adds some more type safety, and avoids both the double-evaluation and the gcc warning. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484 Link: http://lkml.kernel.org/r/20170726133756.2161367-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20171102114048.1526955-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN_ONCE for all architecturesAndi Kleen1-0/+5
Some architectures store the WARN_ONCE state in the flags field of the bug_entry. Clear that one too when resetting once state through /sys/kernel/debug/clear_warn_once Pointed out by Michael Ellerman Improves the earlier patch that add clear_warn_once. [ak@linux.intel.com: add a missing ifdef CONFIG_MODULES] Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org [akpm@linux-foundation.org: fix unused var warning] [akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe] [akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe] Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN*_ONCEAndi Kleen3-3/+7
I like _ONCE warnings because it's guaranteed that they don't flood the log. During testing I find it useful to reset the state of the once warnings, so that I can rerun tests and see if they trigger again, or can guarantee that a test run always hits the same warnings. This patch adds a debugfs interface to reset all the _ONCE warnings so that they appear again: echo 1 > /sys/kernel/debug/clear_warn_once This is implemented by putting all the warning booleans into a special section, and clearing it. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm, compaction: persistently skip hugetlbfs pageblocksDavid Rientjes1-0/+11
It is pointless to migrate hugetlb memory as part of memory compaction if the hugetlb size is equal to the pageblock order. No defragmentation is occurring in this condition. It is also pointless to for the freeing scanner to scan a pageblock where a hugetlb page is pinned. Unconditionally skip these pageblocks, and do so peristently so that they are not rescanned until it is observed that these hugepages are no longer pinned. It would also be possible to do this by involving the hugetlb subsystem in marking pageblocks to no longer be skipped when they hugetlb pages are freed. This is a simple solution that doesn't involve any additional subsystems in pageblock skip manipulation. [rientjes@google.com: fix build] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708201734390.117182@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151639130.106658@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17mm: fix nodemask printingArnd Bergmann1-3/+10
The cleanup caused build warnings for constant mask pointers: mm/mempolicy.c: In function `mpol_to_str': ./include/linux/nodemask.h:108:11: warning: the comparison will always evaluate as `true' for the address of `nodes' will never be NULL [-Waddress] An earlier workaround I suggested was incorporated in the version that got merged, but that only solved the problem for gcc-7 and higher, while gcc-4.6 through gcc-6.x still warn. This changes the printing again to use inline functions that make it clear to the compiler that the line that does the NULL check has no idea whether the argument is a constant NULL. Link: http://lkml.kernel.org/r/20171117101545.119689-1-arnd@arndb.de Fixes: 0205f75571e3 ("mm: simplify nodemask printing") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Zhangshaokun <zhangshaokun@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds10-10/+83
Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-16Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds7-17/+177
Pull ARM SoC driver updates from Arnd Bergmann: "This branch contains platform-related driver updates for ARM and ARM64, these are the areas that bring the changes: New drivers: - driver support for Renesas R-Car V3M (R8A77970) - power management support for Amlogic GX - a new driver for the Tegra BPMP thermal sensor - a new bus driver for Technologic Systems NBUS Changes for subsystems that prefer to merge through arm-soc: - the usual updates for reset controller drivers from Philipp Zabel, with five added drivers for SoCs in the arc, meson, socfpa, uniphier and mediatek families - updates to the ARM SCPI and PSCI frameworks, from Sudeep Holla, Heiner Kallweit and Lorenzo Pieralisi Changes specific to some ARM-based SoC - the Freescale/NXP DPAA QBMan drivers from PowerPC can now work on ARM as well - several changes for power management on Broadcom SoCs - various improvements on Qualcomm, Broadcom, Amlogic, Atmel, Mediatek - minor Cleanups for Samsung, TI OMAP SoCs" [ NOTE! This doesn't work without the previous ARM SoC device-tree pull, because the R8A77970 driver is missing a header file that came from that pull. The fact that this got merged afterwards only fixes it at this point, and bisection of that driver will fail if/when you walk into the history of that driver. - Linus ] * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (96 commits) soc: amlogic: meson-gx-pwrc-vpu: fix power-off when powered by bootloader bus: add driver for the Technologic Systems NBUS memory: omap-gpmc: Remove deprecated gpmc_update_nand_reg() soc: qcom: remove unused label soc: amlogic: gx pm domain: add PM and OF dependencies drivers/firmware: psci_checker: Add missing destroy_timer_on_stack() dt-bindings: power: add amlogic meson power domain bindings soc: amlogic: add Meson GX VPU Domains driver soc: qcom: Remote filesystem memory driver dt-binding: soc: qcom: Add binding for rmtfs memory of: reserved_mem: Accessor for acquiring reserved_mem of/platform: Generalize /reserved-memory handling soc: mediatek: pwrap: fix fatal compiler error soc: mediatek: pwrap: fix compiler errors arm64: mediatek: cleanup message for platform selection soc: Allow test-building of MediaTek drivers soc: mediatek: place Kconfig for all SoC drivers under menu soc: mediatek: pwrap: add support for MT7622 SoC soc: mediatek: pwrap: add common way for setup CS timing extenstion soc: mediatek: pwrap: add MediaTek MT6380 as one slave of pwrap ..
2017-11-16Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds11-4180/+99
Pull ARM device-tree updates from Arnd Bergmann: "We add device tree files for a couple of additional SoCs in various areas: Allwinner R40/V40 for entertainment, Broadcom Hurricane 2 for networking, Amlogic A113D for audio, and Renesas R-Car V3M for automotive. As usual, lots of new boards get added based on those and other SoCs: - Actions S500 based CubieBoard6 single-board computer - Amlogic Meson-AXG A113D based development board - Amlogic S912 based Khadas VIM2 single-board computer - Amlogic S912 based Tronsmart Vega S96 set-top-box - Allwinner H5 based NanoPi NEO Plus2 single-board computer - Allwinner R40 based Banana Pi M2 Ultra and Berry single-board computers - Allwinner A83T based TBS A711 Tablet - Broadcom Hurricane 2 based Ubiquiti UniFi Switch 8 - Broadcom bcm47xx based Luxul XAP-1440/XAP-810/ABR-4500/XBR-4500 wireless access points and routers - NXP i.MX51 based Zodiac Inflight Innovations RDU1 board - NXP i.MX53 based GE Healthcare PPD biometric monitor - NXP i.MX6 based Pistachio single-board computer - NXP i.MX6 based Vining-2000 automotive diagnostic interface - NXP i.MX6 based Ka-Ro TX6 Computer-on-Module in additional variants - Qualcomm MSM8974 (Snapdragon 800) based Fairphone 2 phone - Qualcomm MSM8974pro (Snapdragon 801) based Sony Xperia Z2 Tablet - Realtek RTD1295 based set-top-boxes MeLE V9 and PROBOX2 AVA - Renesas R-Car V3M (R8A77970) SoC and "Eagle" reference board - Renesas H3ULCB and M3ULCB "Kingfisher" extension infotainment boards - Renasas r8a7745 based iWave G22D-SODIMM SoM - Rockchip rk3288 based Amarula Vyasa single-board computer - Samsung Exynos5800 based Odroid HC1 single-board computer For existing SoC support, there was a lot of ongoing work, as usual most of that concentrated on the Renesas, Rockchip, OMAP, i.MX, Amlogic and Allwinner platforms, but others were also active. Rob Herring and many others worked on reducing the number of issues that the latest version of 'dtc' now warns about. Unfortunately there is still a lot left to do. A rework of the ARM foundation model introduced several new files for common variations of the model" * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (599 commits) arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3 dt-bindings: bus: Add documentation for the Technologic Systems NBUS arm64: dts: actions: s900-bubblegum-96: Add fake uart5 clock ARM: dts: owl-s500: Add CubieBoard6 dt-bindings: arm: actions: Add CubieBoard6 ARM: dts: owl-s500-guitar-bb-rev-b: Add fake uart3 clock ARM: dts: owl-s500: Set power domains for CPU2 and CPU3 arm: dts: mt7623: remove unused compatible string for pio node arm: dts: mt7623: update usb related nodes arm: dts: mt7623: update crypto node ARM: dts: sun8i: a711: Enable USB OTG ARM: dts: sun8i: a711: Add regulator support ARM: dts: sun8i: a83t: bananapi-m3: Enable AP6212 WiFi on mmc1 ARM: dts: sun8i: a83t: cubietruck-plus: Enable AP6330 WiFi on mmc1 ARM: dts: sun8i: a83t: Move mmc1 pinctrl setting to dtsi file ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes ARM: dts: sunxi: Add dtsi for AXP81x PMIC arm64: dts: allwinner: H5: Restore EMAC changes ...
2017-11-16Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds2-24/+70
Pull ARM SoC platform updates from Arnd Bergmann: "Most of the commits are for defconfig changes, to enable newly added drivers or features that people have started using. For the changed lines lines, we have mostly cleanups, the affected platforms are OMAP, Versatile, EP93xx, Samsung, Broadcom, i.MX, and Actions. The largest single change is the introduction of the TI "sysc" bus driver, with the intention of cleaning up more legacy code. Two new SoC platforms get added this time: - Allwinner R40 is a modernized version of the A20 chip, now with a Quad-Core ARM Cortex-A7. According to the manufacturer, it is intended for "Smart Hardware" - Broadcom Hurricane 2 (Aka Strataconnect BCM5334X) is a family of chips meant for managed gigabit ethernet switches, based around a Cortex-A9 CPU. Finally, we gain SMP support for two platforms: Renesas R-Car E2 and Amlogic Meson8/8b, which were previously added but only supported uniprocessor operation" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (118 commits) ARM: multi_v7_defconfig: Select RPMSG_VIRTIO as module ARM: multi_v7_defconfig: enable CONFIG_GPIO_UNIPHIER arm64: defconfig: enable CONFIG_GPIO_UNIPHIER ARM: meson: enable MESON_IRQ_GPIO in Kconfig for meson8b ARM: meson: Add SMP bringup code for Meson8 and Meson8b ARM: smp_scu: allow the platform code to read the SCU CPU status ARM: smp_scu: add a helper for powering on a specific CPU dt-bindings: Amlogic: Add Meson8 and Meson8b SMP related documentation ARM: OMAP3: Delete an unnecessary variable initialisation in omap3xxx_hwmod_init() ARM: OMAP3: Use common error handling code in omap3xxx_hwmod_init() ARM: defconfig: select the right SX150X driver arm64: defconfig: Enable QCOM_IOMMU arm64: Add ThunderX drivers to defconfig arm64: defconfig: Enable Tegra PCI controller cpufreq: imx6q: Move speed grading check to cpufreq driver arm64: defconfig: re-enable Qualcomm DB410c USB ARM: configs: stm32: Add MDMA support in STM32 defconfig ARM: imx: Enable cpuidle for i.MX6DL starting at 1.1 bus: ti-sysc: Fix unbalanced pm_runtime_enable by adding remove bus: ti-sysc: mark PM functions as __maybe_unused ...
2017-11-16Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+34
Pull virtio updates from Michael Tsirkin: "Fixes in qemu, vhost and virtio" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: fw_cfg: fix the command line module name vhost/vsock: fix uninitialized vhost_vsock->guest_cid vhost: fix end of range for access_ok vhost/scsi: Use safe iteration in vhost_scsi_complete_cmd_work() virtio_balloon: fix deadlock on OOM
2017-11-16Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds3-1/+71
Pull xen updates from Juergen Gross: "Xen features and fixes for v4.15-rc1 Apart from several small fixes it contains the following features: - a series by Joao Martins to add vdso support of the pv clock interface - a series by Juergen Gross to add support for Xen pv guests to be able to run on 5 level paging hosts - a series by Stefano Stabellini adding the Xen pvcalls frontend driver using a paravirtualized socket interface" * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits) xen/pvcalls: fix potential endless loop in pvcalls-front.c xen/pvcalls: Add MODULE_LICENSE() MAINTAINERS: xen, kvm: track pvclock-abi.h changes x86/xen/time: setup vcpu 0 time info page x86/xen/time: set pvclock flags on xen_time_init() x86/pvclock: add setter for pvclock_pvti_cpu0_va ptp_kvm: probe for kvm guest availability xen/privcmd: remove unused variable pageidx xen: select grant interface version xen: update arch/x86/include/asm/xen/cpuid.h xen: add grant interface version dependent constants to gnttab_ops xen: limit grant v2 interface to the v1 functionality xen: re-introduce support for grant v2 interface xen: support priv-mapping in an HVM tools domain xen/pvcalls: remove redundant check for irq >= 0 xen/pvcalls: fix unsigned less than zero error check xen/time: Return -ENODEV from xen_get_wallclock() xen/pvcalls-front: mark expected switch fall-through xen: xenbus_probe_frontend: mark expected switch fall-throughs xen/time: do not decrease steal time after live migration on xen ...
2017-11-16Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-7/+21
Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.15 Common: - Python 3 support in kvm_stat - Accounting of slabs to kmemcg ARM: - Optimized arch timer handling for KVM/ARM - Improvements to the VGIC ITS code and introduction of an ITS reset ioctl - Unification of the 32-bit fault injection logic - More exact external abort matching logic PPC: - Support for running hashed page table (HPT) MMU mode on a host that is using the radix MMU mode; single threaded mode on POWER 9 is added as a pre-requisite - Resolution of merge conflicts with the last second 4.14 HPT fixes - Fixes and cleanups s390: - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes x86: - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and after-reset state - Refined dependencies for VMX features - Fixes for nested SMI injection - A lot of cleanups" * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits) KVM: s390: provide a capability for AIS state migration KVM: s390: clear_io_irq() requests are not expected for adapter interrupts KVM: s390: abstract conversion between isc and enum irq_types KVM: s390: vsie: use common code functions for pinning KVM: s390: SIE considerations for AP Queue virtualization KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup KVM: PPC: Book3S HV: Cosmetic post-merge cleanups KVM: arm/arm64: fix the incompatible matching for external abort KVM: arm/arm64: Unify 32bit fault injection KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared KVM: arm/arm64: vgic-its: New helper functions to free the caches KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device arm/arm64: KVM: Load the timer state when enabling the timer KVM: arm/arm64: Rework kvm_timer_should_fire KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit KVM: arm/arm64: Move phys_timer_emulate function KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps ...
2017-11-16Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds1-7/+16
Pull user namespace update from Eric Biederman: "The only change that is production ready this round is the work to increase the number of uid and gid mappings a user namespace can support from 5 to 340. This code was carefully benchmarked and it was confirmed that in the existing cases the performance remains the same. In the worst case with 340 mappings an cache cold stat times go from 158ns to 248ns. That is noticable but still quite small, and only the people who are doing crazy things pay the cost. This work uncovered some documentation and cleanup opportunities in the mapping code, and patches to make those cleanups and improve the documentation will be coming in the next merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Simplify insert_extent userns: Make map_id_down a wrapper for map_id_range_down userns: Don't read extents twice in m_start userns: Simplify the user and group mapping functions userns: Don't special case a count of 0 userns: bump idmap limits to 340 userns: use union in {g,u}idmap struct
2017-11-16Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fsLinus Torvalds2-9/+117
Pull f2fs updates from Jaegeuk Kim: "In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancements: - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fixes: - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status" * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits) f2fs: deny accessing encryption policy if encryption is off f2fs: inject fault in inc_valid_node_count f2fs: fix to clear FI_NO_PREALLOC f2fs: expose quota information in debugfs f2fs: separate nat entry mem alloc from nat_tree_lock f2fs: validate before set/clear free nat bitmap f2fs: avoid opened loop codes in __add_ino_entry f2fs: apply write hints to select the type of segments for buffered write f2fs: introduce scan_curseg_cache for cleanup f2fs: optimize the way of traversing free_nid_bitmap f2fs: keep scanning until enough free nids are acquired f2fs: trace checkpoint reason in fsync() f2fs: keep isize once block is reserved cross EOF f2fs: avoid race in between GC and block exchange f2fs: save a multiplication for last_nid calculation f2fs: fix summary info corruption f2fs: remove dead code in update_meta_page f2fs: remove unneeded semicolon f2fs: don't bother with inode->i_version f2fs: check curseg space before foreground GC ...
2017-11-16Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds3-6/+303
Pull AFS updates from David Howells: "kAFS filesystem driver overhaul. The major points of the overhaul are: (1) Preliminary groundwork is laid for supporting network-namespacing of kAFS. The remainder of the namespacing work requires some way to pass namespace information to submounts triggered by an automount. This requires something like the mount overhaul that's in progress. (2) sockaddr_rxrpc is used in preference to in_addr for holding addresses internally and add support for talking to the YFS VL server. With this, kAFS can do everything over IPv6 as well as IPv4 if it's talking to servers that support it. (3) Callback handling is overhauled to be generally passive rather than active. 'Callbacks' are promises by the server to tell us about data and metadata changes. Callbacks are now checked when we next touch an inode rather than actively going and looking for it where possible. (4) File access permit caching is overhauled to store the caching information per-inode rather than per-directory, shared over subordinate files. Whilst older AFS servers only allow ACLs on directories (shared to the files in that directory), newer AFS servers break that restriction. To improve memory usage and to make it easier to do mass-key removal, permit combinations are cached and shared. (5) Cell database management is overhauled to allow lighter locks to be used and to make cell records autonomous state machines that look after getting their own DNS records and cleaning themselves up, in particular preventing races in acquiring and relinquishing the fscache token for the cell. (6) Volume caching is overhauled. The afs_vlocation record is got rid of to simplify things and the superblock is now keyed on the cell and the numeric volume ID only. The volume record is tied to a superblock and normal superblock management is used to mediate the lifetime of the volume fscache token. (7) File server record caching is overhauled to make server records independent of cells and volumes. A server can be in multiple cells (in such a case, the administrator must make sure that the VL services for all cells correctly reflect the volumes shared between those cells). Server records are now indexed using the UUID of the server rather than the address since a server can have multiple addresses. (8) File server rotation is overhauled to handle VMOVED, VBUSY (and similar), VOFFLINE and VNOVOL indications and to handle rotation both of servers and addresses of those servers. The rotation will also wait and retry if the server says it is busy. (9) Data writeback is overhauled. Each inode no longer stores a list of modified sections tagged with the key that authorised it in favour of noting the modified region of a page in page->private and storing a list of keys that made modifications in the inode. This simplifies things and allows other keys to be used to actually write to the server if a key that made a modification becomes useless. (10) Writable mmap() is implemented. This allows a kernel to be build entirely on AFS. Note that Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998)" * tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits) afs: Protect call->state changes against signals afs: Trace page dirty/clean afs: Implement shared-writeable mmap afs: Get rid of the afs_writeback record afs: Introduce a file-private data record afs: Use a dynamic port if 7001 is in use afs: Fix directory read/modify race afs: Trace the sending of pages afs: Trace the initiation and completion of client calls afs: Fix documentation on # vs % prefix in mount source specification afs: Fix total-length calculation for multiple-page send afs: Only progress call state at end of Tx phase from rxrpc callback afs: Make use of the YFS service upgrade to fully support IPv6 afs: Overhaul volume and server record caching and fileserver rotation afs: Move server rotation code into its own file afs: Add an address list concept afs: Overhaul cell database management afs: Overhaul permit caching afs: Overhaul the callback handling afs: Rename struct afs_call server member to cm_server ...
2017-11-16Merge tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds4-6/+11
Pull pin control updates from Linus Walleij: "This is the bulk of pin control changes for the v4.15 kernel cycle: Core: - The pin control Kconfig entry PINCTRL is now turned into a menuconfig option. This obviously has the implication of making the subsystem menu visible in menuconfig. This is happening because of two things: (a) Intel have started to deploy and depend on pin controllers in a way that is affecting users directly. This happens on the highly integrated laptop chipsets named after geographical places: baytrail, broxton, cannonlake, cedarfork, cherryview, denverton, geminilake, lewisburg, merrifield, sunrisepoint... It started a while back and now it is ever more evident that this is crucial infrastructure for x86 laptops and not an embedded obscurity anymore. Users need to be aware. (b) Pin control expanders on I2C and SPI that are arch-agnostic. Currently Semtech SX150X and Microchip MCP28x08 but more are expected. Users will have to be able to configure these in directly for their set-up. - Just go and select GPIOLIB now that we made sure that GPIOLIB is a very vanilla subsystem. Do not depend on it, if we need it, select it. - Exposing the pin control subsystem in menuconfig uncovered a bunch of obscure bugs that are now hopefully fixed, all more or less pertaining to Blackfin. - Unified namespace for cross-calls between pin control and GPIO. - New support for clock skew/delay generic DT bindings and generic pin config options for this. - Minor documentation improvements. Various: - The Renesas SH-PFC pin controller has evolved a lot. It seems Renesas are churning out new SoCs by the minute. - A bunch of non-critical fixes for the Rockchip driver. - Improve the use of library functions instead of open coding. - Support the MCP28018 variant in the MCP28x08 driver. - Static constifying" * tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (91 commits) pinctrl: gemini: Fix missing pad descriptions pinctrl: Add some depends on HAS_IOMEM pinctrl: samsung/s3c24xx: add CONFIG_OF dependency pinctrl: gemini: Fix GMAC groups pinctrl: qcom: spmi-gpio: Add pmi8994 gpio support pinctrl: ti-iodelay: remove redundant unused variable dev pinctrl: max77620: Use common error handling code in max77620_pinconf_set() pinctrl: gemini: Implement clock skew/delay config pinctrl: gemini: Use generic DT parser pinctrl: Add skew-delay pin config and bindings pinctrl: armada-37xx: Add edge both type gpio irq support pinctrl: uniphier: remove eMMC hardware reset pin-mux pinctrl: rockchip: Add iomux-route switching support for rk3288 pinctrl: intel: Add Intel Cedar Fork PCH pin controller support pinctrl: intel: Make offset to interrupt status register configurable pinctrl: sunxi: Enforce the strict mode by default pinctrl: sunxi: Disable strict mode for old pinctrl drivers pinctrl: sunxi: Introduce the strict flag pinctrl: sh-pfc: Save/restore registers for PSCI system suspend pinctrl: sh-pfc: r8a7796: Use generic IOCTRL register description ...
2017-11-16Merge tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds2-6/+84
Pull MFD updates from Lee Jones: "New drivers: - Add support for Cherry Trail Dollar Cove TI PMIC - Add support for Add Spreadtrum SC27xx series PMICs New device support: - Add support Regulator to axp20x New functionality: - Add DT support; aspeed-scu sc27xx-pmic - Add power saving support; rts5249 Fix-ups: - DT clean-up/rework; tps65217, max77693, iproc-cdru, iproc-mhb, tps65218 - Staticise/constify; stw481x - Use new succinct IRQ API; fsl-imx25-tsadc - Kconfig fix-ups; MFD_TPS65218 - Identify SPI method; lpc_ich - Use managed resources (devm_*) calls; ssbi - Remove unused/obsolete code/documentation; mc13xxx Bug fixes: - Fix typo in MAINTAINERS - Fix error handling; mxs-lradc - Clean-up IRQs on .remove; fsl-imx25-tsadc" * tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (21 commits) dt-bindings: mfd: mc13xxx: Remove obsolete property mfd: axp20x: Add axp20x-regulator cell for AXP813 mfd: Add Spreadtrum SC27xx series PMICs driver dt-bindings: mfd: Add Spreadtrum SC27xx PMIC documentation mfd: ssbi: Use devm_of_platform_populate() mfd: fsl-imx25: Clean up irq settings during removal mfd: mxs-lradc: Fix error handling in mxs_lradc_probe() mfd: lpc_ich: Avoton/Rangeley uses SPI_BYT method mfd: tps65218: Introduce dependency on CONFIG_OF mfd: tps65218: Correct the config description MAINTAINERS: Fix Dialog search term for watchdog binding file mfd: fsl-imx25: Set irq handler and data in one go mfd: rts5249: Add support for RTS5250S power saving ACPI / PMIC: Add opregion driver for Intel Dollar Cove TI PMIC mfd: Add support for Cherry Trail Dollar Cove TI PMIC syscon: dt-bindings: Add binding document for iProc MHB block syscon: dt-bindings: Add binding doc for Broadcom iProc CDRU mfd: max77693: Add muic of_compatible in mfd_cell mfd: stw481x: Make three arrays static const, reduces object code size mfd: tps65217: Introduce dependency on CONFIG_OF ...
2017-11-16Merge tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds2-0/+11
Pull char/misc updates from Greg KH: "Here is the big set of char/misc and other driver subsystem patches for 4.15-rc1. There are small changes all over here, hyperv driver updates, pcmcia driver updates, w1 driver updats, vme driver updates, nvmem driver updates, and lots of other little one-off driver updates as well. The shortlog has the full details. All of these have been in linux-next for quite a while with no reported issues" * tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits) VME: Return -EBUSY when DMA list in use w1: keep balance of mutex locks and refcnts MAINTAINERS: Update VME subsystem tree. nvmem: sunxi-sid: add support for A64/H5's SID controller nvmem: imx-ocotp: Update module description nvmem: imx-ocotp: Enable i.MX7D OTP write support nvmem: imx-ocotp: Add i.MX7D timing write clock setup support nvmem: imx-ocotp: Move i.MX6 write clock setup to dedicated function nvmem: imx-ocotp: Add support for banked OTP addressing nvmem: imx-ocotp: Pass parameters via a struct nvmem: imx-ocotp: Restrict OTP write to IMX6 processors nvmem: uniphier: add UniPhier eFuse driver dt-bindings: nvmem: add description for UniPhier eFuse nvmem: set nvmem->owner to nvmem->dev->driver->owner if unset nvmem: qfprom: fix different address space warnings of sparse nvmem: mtk-efuse: fix different address space warnings of sparse nvmem: mtk-efuse: use stack for nvmem_config instead of malloc'ing it nvmem: imx-iim: use stack for nvmem_config instead of malloc'ing it thunderbolt: tb: fix use after free in tb_activate_pcie_devices MAINTAINERS: Add git tree for Thunderbolt development ...
2017-11-16Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-32/+6
Pull driver core updates from Greg KH: "Here is the set of driver core / debugfs patches for 4.15-rc1. Not many here, mostly all are debugfs fixes to resolve some long-reported problems with files going away with references to them in userspace. There's also some SPDX cleanups for the debugfs code, as well as a few other minor driver core changes for issues reported by people. All of these have been in linux-next for a week or more with no reported issues" * tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix device link deferred probe debugfs: Remove redundant license text debugfs: add SPDX identifiers to all debugfs files debugfs: defer debugfs_fsdata allocation to first usage debugfs: call debugfs_real_fops() only after debugfs_file_get() debugfs: purge obsolete SRCU based removal protection IB/hfi1: convert to debugfs_file_get() and -put() debugfs: convert to debugfs_file_get() and -put() debugfs: debugfs_real_fops(): drop __must_hold sparse annotation debugfs: implement per-file removal protection debugfs: add support for more elaborate ->d_fsdata driver core: Move device_links_purge() after bus_remove_device() arch_topology: Fix section miss match warning due to free_raw_capacity() driver-core: pr_err() strings should end with newlines
2017-11-16Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linuxRadim Krčmář2-0/+2
KVM: s390: fixes and improvements for 4.15 - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes - merge of the sthyi tree from the base s390 team, which moves the sthyi out of KVM into a shared function also for non-KVM
2017-11-15Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds43-275/+1638
Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.15. Core: - Atomic object lifetime fixes - Atomic iterator improvements - Sparse/smatch fixes - Legacy kms ioctls to be interruptible - EDID override improvements - fb/gem helper cleanups - Simple outreachy patches - Documentation improvements - Fix dma-buf rcu races - DRM mode object leasing for improving VR use cases. - vgaarb improvements for non-x86 platforms. New driver: - tve200: Faraday Technology TVE200 block. This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in the StorLink SL3516 (later Cortina Systems CS3516) as well as the Grain Media GM8180. New bridges: - SiI9234 support New panels: - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba LT089AC19000, Innolux AT043TN24 i915: - Remove Coffeelake from alpha support - Cannonlake workarounds - Infoframe refactoring for DisplayPort - VBT updates - DisplayPort vswing/emph/buffer translation refactoring - CCS fixes - Restore GPU clock boost on missed vblanks - Scatter list updates for userptr allocations - Gen9+ transition watermarks - Display IPC (Isochronous Priority Control) - Private PAT management - GVT: improved error handling and pci config sanitizing - Execlist refactoring - Transparent Huge Page support - User defined priorities support - HuC/GuC firmware refactoring - DP MST fixes - eDP power sequencing fixes - Use RCU instead of stop_machine - PSR state tracking support - Eviction fixes - BDW DP aux channel timeout fixes - LSPCON fixes - Cannonlake PLL fixes amdgpu: - Per VM BO support - Powerplay cleanups - CI powerplay support - PASID mgr for kfd - SR-IOV fixes - initial GPU reset for vega10 - Prime mmap support - TTM updates - Clock query interface for Raven - Fence to handle ioctl - UVD encode ring support on Polaris - Transparent huge page DMA support - Compute LRU pipe tweaks - BO flag to allow buffers to opt out of implicit sync - CTX priority setting API - VRAM lost infrastructure plumbing qxl: - fix flicker since atomic rework amdkfd: - Further improvements from internal AMD tree - Usermode events - Drop radeon support nouveau: - Pascal temperature sensor support - Improved BAR2 handling - MMU rework to support Pascal MMU exynos: - Improved HDMI/mixer support - HDMI audio interface support tegra: - Prep work for tegra186 - Cleanup/fixes msm: - Preemption support for a5xx - Display fixes for 8x96 (snapdragon 820) - Async cursor plane fixes - FW loading rework - GPU debugging improvements vc4: - Prep for DSI panels - fix T-format tiling scanout - New madvise ioctl Rockchip: - LVDS support omapdrm: - omap4 HDMI CEC support etnaviv: - GPU performance counters groundwork sun4i: - refactor driver load + TCON backend - HDMI improvements - A31 support - Misc fixes udl: - Probe/EDID read fixes. tilcdc: - Misc fixes. pl111: - Support more variants adv7511: - Improve EDID handling. - HDMI CEC support sii8620: - Add remote control support" * tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits) drm/rockchip: analogix_dp: Use mutex rather than spinlock drm/mode_object: fix documentation for object lookups. drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU drm/i915: Move init_clock_gating() back to where it was drm/i915: Prune the reservation shared fence array drm/i915: Idle the GPU before shinking everything drm/i915: Lock llist_del_first() vs llist_del_all() drm/i915: Calculate ironlake intermediate watermarks correctly, v2. drm/i915: Disable lazy PPGTT page table optimization for vGPU drm/i915/execlists: Remove the priority "optimisation" drm/i915: Filter out spurious execlists context-switch interrupts drm/amdgpu: use irq-safe lock for kiq->ring_lock drm/amdgpu: bypass lru touch for KIQ ring submission drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories() drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs() drm/amd/powerplay: initialize a variable before using it drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug drm/rockchip: add CONFIG_OF dependency for lvds ...
2017-11-15Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds11-205/+382
Pull media updates from Mauro Carvalho Chehab: - Documentation for digital TV (both kAPI and uAPI) are now in sync with the implementation (except for legacy/deprecated ioctls). This is a major step, as there were always a gap there - New sensor driver: imx274 - New cec driver: cec-gpio - New platform driver for rockship rga and tegra CEC - New RC driver: tango-ir - Several cleanups at atomisp driver - Core improvements for RC, CEC, V4L2 async probing support and DVB - Lots of drivers cleanup, fixes and improvements. * tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits) dvb_frontend: don't use-after-free the frontend struct media: dib0700: fix invalid dvb_detach argument media: v4l2-ctrls: Don't validate BITMASK twice media: s5p-mfc: fix lockdep warning media: dvb-core: always call invoke_release() in fe_free() media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep media: au0828: make const array addr_list static media: cx88: make const arrays default_addr_list and pvr2000_addr_list static media: drxd: make const array fastIncrDecLUT static media: usb: fix spelling mistake: "synchronuously" -> "synchronously" media: ddbridge: fix build warnings media: av7110: avoid 2038 overflow in debug print media: Don't do DMA on stack for firmware upload in the AS102 driver media: v4l: async: fix unregister for implicitly registered sub-device notifiers media: v4l: async: fix return of unitialized variable ret media: imx274: fix missing return assignment from call to imx274_mode_regs media: camss-vfe: always initialize reg at vfe_set_xbar_cfg() media: atomisp: make function calls cleaner media: atomisp: get rid of storage_class.h media: atomisp: get rid of wrong stddef.h include ...
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds37-359/+300
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-15mm: simplify nodemask printingMichal Hocko1-1/+3
alloc_warn() and dump_header() have to explicitly handle NULL nodemask which forces both paths to use pr_cont. We can do better. printk already handles NULL pointers properly so all we need is to teach nodemask_pr_args to handle NULL nodemask carefully. This allows simplification of both alloc_warn() and dump_header() and gets rid of pr_cont altogether. This patch has been motivated by patch from Joe Perches http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com [akpm@linux-foundation.org: fix tile warning, per Arnd] Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15writeback: remove unused function parameterWang Long1-1/+1
The parameter `struct bdi_writeback *wb` is not been used in the function body. Remove it. Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.com Signed-off-by: Wang Long <wanglong19@meituan.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm/page_alloc.c: broken deferred calculationPavel Tatashin1-1/+2
In reset_deferred_meminit() we determine number of pages that must not be deferred. We initialize pages for at least 2G of memory, but also pages for reserved memory in this node. The reserved memory is determined in this function: memblock_reserved_memory_within(), which operates over physical addresses, and returns size in bytes. However, reset_deferred_meminit() assumes that that this function operates with pfns, and returns page count. The result is that in the best case machine boots slower than expected due to initializing more pages than needed in single thread, and in the worst case panics because fewer than needed pages are initialized early. Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, sysctl: make NUMA stats configurableKemi Wang1-0/+10
This is the second step which introduces a tunable interface that allow numa stats configurable for optimizing zone_statistics(), as suggested by Dave Hansen and Ying Huang. ========================================================================= When page allocation performance becomes a bottleneck and you can tolerate some possible tool breakage and decreased numa counter precision, you can do: echo 0 > /proc/sys/vm/numa_stat In this case, numa counter update is ignored. We can see about *4.8%*(185->176) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench03 (88 threads) running on a 2-Socket Broadwell-based server (88 threads, 126G memory). Benchmark link provided by Jesper D Brouer (increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench ========================================================================= When page allocation performance is not a bottleneck and you want all tooling to work, you can do: echo 1 > /proc/sys/vm/numa_stat This is system default setting. Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka for comments to help improve the original patch. [keescook@chromium.org: make sure mutex is a global static] Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Ying Huang <ying.huang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christopher Lameter <cl@linux.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, pagevec: rename pagevec drained fieldMel Gorman1-2/+2
According to Vlastimil Babka, the drained field in pagevec is potentially misleading because it might be interpreted as draining this pagevec instead of the percpu lru pagevecs. Rename the field for clarity. Link: http://lkml.kernel.org/r/20171019093346.ylahzdpzmoriyf4v@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: remove __GFP_COLDMel Gorman5-17/+2
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: remove cold parameter from free_hot_cold_page*Mel Gorman2-9/+6
Most callers users of free_hot_cold_page claim the pages being released are cache hot. The exception is the page reclaim paths where it is likely that enough pages will be freed in the near future that the per-cpu lists are going to be recycled and the cache hotness information is lost. As no one really cares about the hotness of pages being released to the allocator, just ditch the parameter. The APIs are renamed to indicate that it's no longer about hot/cold pages. It should also be less confusing as there are subtle differences between them. __free_pages drops a reference and frees a page when the refcount reaches zero. free_hot_cold_page handled pages whose refcount was already zero which is non-obvious from the name. free_unref_page should be more obvious. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. [mgorman@techsingularity.net: add pages to head, not tail] Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: remove cold parameter for release_pagesMel Gorman2-2/+2
All callers of release_pages claim the pages being released are cache hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-7-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, pagevec: remove cold parameter for pagevecsMel Gorman1-3/+1
Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: only drain per-cpu pagevecs once per pagevec usageMel Gorman1-1/+3
When a pagevec is initialised on the stack, it is generally used multiple times over a range of pages, looking up entries and then releasing them. On each pagevec_release, the per-cpu deferred LRU pagevecs are drained on the grounds the page being released may be on those queues and the pages may be cache hot. In many cases only the first drain is necessary as it's unlikely that the range of pages being walked is racing against LRU addition. Even if there is such a race, the impact is marginal where as constantly redraining the lru pagevecs costs. This patch ensures that pagevec is only drained once in a given lifecycle without increasing the cache footprint of the pagevec structure. Only sparsetruncate tiny is shown here as large files have many exceptional entries and calls pagecache_release less frequently. sparsetruncate (tiny) 4.14.0-rc4 4.14.0-rc4 batchshadow-v1r1 onedrain-v1r1 Min Time 141.00 ( 0.00%) 141.00 ( 0.00%) 1st-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%) 2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%) 3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%) Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%) Max-95% Time 146.00 ( 0.00%) 145.00 ( 0.68%) Max-99% Time 198.00 ( 0.00%) 194.00 ( 2.02%) Max Time 254.00 ( 0.00%) 208.00 ( 18.11%) Amean Time 145.12 ( 0.00%) 144.30 ( 0.56%) Stddev Time 12.74 ( 0.00%) 9.62 ( 24.49%) Coeff Time 8.78 ( 0.00%) 6.67 ( 24.06%) Best99%Amean Time 144.29 ( 0.00%) 143.82 ( 0.32%) Best95%Amean Time 142.68 ( 0.00%) 142.31 ( 0.26%) Best90%Amean Time 142.52 ( 0.00%) 142.19 ( 0.24%) Best75%Amean Time 142.26 ( 0.00%) 141.98 ( 0.20%) Best50%Amean Time 141.90 ( 0.00%) 141.71 ( 0.13%) Best25%Amean Time 141.80 ( 0.00%) 141.43 ( 0.26%) The impact on bonnie is marginal and within the noise because a significant percentage of the file being truncated has been reclaimed and consists of shadow entries which reduce the hotness of the pagevec_release path. Link: http://lkml.kernel.org/r/20171018075952.10627-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, truncate: do not check mapping for every page being truncatedMel Gorman2-5/+15
During truncation, the mapping has already been checked for shmem and dax so it's known that workingset_update_node is required. This patch avoids the checks on mapping for each page being truncated. In all other cases, a lookup helper is used to determine if workingset_update_node() needs to be called. The one danger is that the API is slightly harder to use as calling workingset_update_node directly without checking for dax or shmem mappings could lead to surprises. However, the API rarely needs to be used and hopefully the comment is enough to give people the hint. sparsetruncate (tiny) 4.14.0-rc4 4.14.0-rc4 oneirq-v1r1 pickhelper-v1r1 Min Time 141.00 ( 0.00%) 140.00 ( 0.71%) 1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%) 2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%) 3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%) Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%) Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%) Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%) Max Time 230.00 ( 0.00%) 205.00 ( 10.87%) Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%) Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%) Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%) Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%) Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%) Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%) Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%) Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%) Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%) As you'd expect, the gain is marginal but it can be detected. The differences in bonnie are all within the noise which is not surprising given the impact on the microbenchmark. radix_tree_update_node_t is a callback for some radix operations that optionally passes in a private field. The only user of the callback is workingset_update_node and as it no longer requires a mapping, the private field is removed. Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: batch radix tree operations when truncating pagesJan Kara1-0/+4
Currently we remove pages from the radix tree one by one. To speed up page cache truncation, lock several pages at once and free them in one go. This allows us to batch radix tree operations in a more efficient way and also save round-trips on mapping->tree_lock. As a result we gain about 20% speed improvement in page cache truncation. Data from a simple benchmark timing 10000 truncates of 1024 pages (on ext4 on ramdisk but the filesystem is barely visible in the profiles). The range shows 1% and 95% percentiles of the measured times: 4.14-rc2 4.14-rc2 + batched truncation 248-256 209-219 249-258 209-217 248-255 211-239 248-255 209-217 247-256 210-218 [jack@suse.cz: convert delete_from_page_cache_batch() to pagevec] Link: http://lkml.kernel.org/r/20171018111648.13714-1-jack@suse.cz [akpm@linux-foundation.org: move struct pagevec forward declaration to top-of-file] Link: http://lkml.kernel.org/r/20171010151937.26984-8-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: speed up cancel_dirty_page() for clean pagesJan Kara1-1/+7
Patch series "Speed up page cache truncation", v1. When rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we have noticed a regression in bonnie++ benchmark when deleting files. Eventually we have tracked this down to a fact that page cache truncation got slower by about 10%. There were both gains and losses in the above interval of kernels but we have been able to identify that commit 83929372f629 ("filemap: prepare find and delete operations for huge pages") caused about 10% regression on its own. After some investigation it didn't seem easily possible to fix the regression while maintaining the THP in page cache functionality so we've decided to optimize the page cache truncation path instead to make up for the change. This series is a result of that effort. Patch 1 is an easy speedup of cancel_dirty_page(). Patches 2-6 refactor page cache truncation code so that it is easier to batch radix tree operations. Patch 7 implements batching of deletes from the radix tree which more than makes up for the original regression. This patch (of 7): cancel_dirty_page() does quite some work even for clean pages (fetching of mapping, locking of memcg, atomic bit op on page flags) so it accounts for ~2.5% of cost of truncation of a clean page. That is not much but still dumb for something we don't need at all. Check whether a page is actually dirty and avoid any work if not. Link: http://lkml.kernel.org/r/20171010151937.26984-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: zero reserved and unavailable struct pagesPavel Tatashin2-0/+31
Some memory is reserved but unavailable: not present in memblock.memory (because not backed by physical pages), but present in memblock.reserved. Such memory has backing struct pages, but they are not initialized by going through __init_single_page(). In some cases these struct pages are accessed even if they do not contain any data. One example is page_to_pfn() might access page->flags if this is where section information is stored (CONFIG_SPARSEMEM, SECTION_IN_PAGE_FLAGS). One example of such memory: trim_low_memory_range() unconditionally reserves from pfn 0, but e820__memblock_setup() might provide the exiting memory from pfn 1 (i.e. KVM). Since struct pages are zeroed in __init_single_page(), and not during allocation time, we must zero such struct pages explicitly. The patch involves adding a new memblock iterator: for_each_resv_unavail_range(i, p_start, p_end) Which iterates through reserved && !memory lists, and we zero struct pages explicitly by calling mm_zero_struct_page(). === Here is more detailed example of problem that this patch is addressing: Run tested on qemu with the following arguments: -enable-kvm -cpu kvm64 -m 512 -smp 2 This patch reports that there are 98 unavailable pages. They are: pfn 0 and pfns in range [159, 255]. Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does not reserve [159, 255] ones. e820__memblock_setup() reports linux that the following physical ranges are available: [1 , 158] [256, 130783] Notice, that exactly unavailable pfns are missing! Now, lets check what we have in zone 0: [1, 131039] pfn 0, is not part of the zone, but pfns [1, 158], are. However, the bigger problem we have if we do not initialize these struct pages is with memory hotplug. Because, that path operates at 2M boundaries (section_nr). And checks if 2M range of pages is hot removable. It starts with first pfn from zone, rounds it down to 2M boundary (sturct pages are allocated at 2M boundaries when vmemmap is created), and checks if that section is hot removable. In this case start with pfn 1 and convert it down to pfn 0. Later pfn is converted to struct page, and some fields are checked. Now, if we do not zero struct pages, we get unpredictable results. In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all vmemmap memory to ones, the following panic is observed with kernel test without this patch applied: BUG: unable to handle kernel NULL pointer dereference at (null) IP: is_pageblock_removable_nolock+0x35/0x90 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT ... task: ffff88001f4e2900 task.stack: ffffc90000314000 RIP: 0010:is_pageblock_removable_nolock+0x35/0x90 Call Trace: ? is_mem_section_removable+0x5a/0xd0 show_mem_removable+0x6b/0xa0 dev_attr_show+0x1b/0x50 sysfs_kf_seq_show+0xa1/0x100 kernfs_seq_show+0x22/0x30 seq_read+0x1ac/0x3a0 kernfs_fop_read+0x36/0x190 ? security_file_permission+0x90/0xb0 __vfs_read+0x16/0x30 vfs_read+0x81/0x130 SyS_read+0x44/0xa0 entry_SYSCALL_64_fastpath+0x1f/0xbd Link: http://lkml.kernel.org/r/20171013173214.27300-7-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Tested-by: Bob Picco <bob.picco@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>