aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-08-22kernel/hung_task.c: allow to set checking interval separately from timeoutDmitry Vyukov2-0/+2
Currently task hung checking interval is equal to timeout, as the result hung is detected anywhere between timeout and 2*timeout. This is fine for most interactive environments, but this hurts automated testing setups (syzbot). In an automated setup we need to strictly order CPU lockup < RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall is not detected as task hung and task hung is not detected as silent machine loss. The large variance in task hung detection timeout requires setting silent machine loss timeout to a very large value (e.g. if task hung is 3 mins, then silent loss need to be set to ~7 mins). The additional 3 minutes significantly reduce testing efficiency because usually we crash kernel within a minute, and this can add hours to bug localization process as it needs to do dozens of tests. Allow setting checking interval separately from timeout. This allows to set timeout to, say, 3 minutes, but checking interval to 10 secs. The interval is controlled via a new hung_task_check_interval_secs sysctl, similar to the existing hung_task_timeout_secs sysctl. The default value of 0 results in the current behavior: checking interval is equal to timeout. [akpm@linux-foundation.org: update hung_task_timeout_max's comment] Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22linux/compiler.h: don't use boolRasmus Villemoes1-1/+1
Appararently, it's possible to have a non-trivial TU include a few headers, including linux/build_bug.h, without ending up with linux/types.h. So the 0day bot sent me config: um-x86_64_defconfig (attached as .config) >> include/linux/compiler.h:316:3: error: unknown type name 'bool'; did you mean '_Bool'? bool __cond = !(condition); \ for something I'm working on. Rather than contributing to the #include madness and including linux/types.h from compiler.h, just use int. Link: http://lkml.kernel.org/r/20180817101036.20969-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22userns: use refcount_t for reference counting instead atomic_tSebastian Andrzej Siewior1-2/+3
refcount_t type and corresponding API should be used instead of atomic_t wh en the variable is used as a reference counter. This avoids accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22bdi: use refcount_t for reference counting instead atomic_tSebastian Andrzej Siewior2-3/+4
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This permits avoiding accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/20180703200141.28415-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22kernel.h: documentation for roundup() vs round_up()Kees Cook1-1/+34
Things like 3619dec5103d ("dh key: fix rounding up KDF output length") expose the lack of explicit documentation for roundup() vs round_up(). At least we can try to document it better if anyone goes looking. Link: http://lkml.kernel.org/r/20180703041950.GA43464@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22include/asm-generic/bug.h: clarify valid uses of WARN()Dmitry Vyukov1-3/+13
Explicitly state that WARN*() should be used only for recoverable kernel issues/bugs and that it should not be used for any kind of invalid external inputs or transient conditions. Motivation: it's a very useful capability to be able to understand if a particular kernel splat means a kernel bug or simply an invalid user-space program. For the former one wants to notify kernel developers, while notifying kernel developers for the latter is annoying. Even a kernel developer may not know what to do with a WARNING in an unfamiliar subsystem. This is especially critical for any automated testing systems that may use panic_on_warn and mail kernel developers. The clear separation also serves as an additional documentation: is it a condition that must never occur because of additional checks/logic elsewhere? or is it simply a check for invalid inputs or unfortunate conditions? Use of pr_err() for user messages also leads to better error messages. "Something is wrong in file foo on line X" is not particularly useful message for end user. pr_err() forces developers to write more meaningful error messages for user. As of now we are almost there. We are doing systematic kernel testing with panic_on_warn and are not seeing massive amounts of false positives. But every now and then another WARN on ENOMEM or invalid inputs pops up and leads to a lengthy argument each time. The goal of this change is to officially document the rules. Link: http://lkml.kernel.org/r/20180620103716.61636-1-dvyukov@gmail.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: add vmcoreinfo note to /proc/kcoreOmar Sandoval1-0/+2
The vmcoreinfo information is useful for runtime debugging tools, not just for crash dumps. A lot of this information can be determined by other means, but this is much more convenient, and it only adds a page at most to the file. Link: http://lkml.kernel.org/r/fddbcd08eed76344863303878b12de1c1e2a04b6.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: don't grab lock for kclist_add()Omar Sandoval1-1/+1
Patch series "/proc/kcore improvements", v4. This series makes a few improvements to /proc/kcore. It fixes a couple of small issues in v3 but is otherwise the same. Patches 1, 2, and 3 are prep patches. Patch 4 is a fix/cleanup. Patch 5 is another prep patch. Patches 6 and 7 are optimizations to ->read(). Patch 8 makes it possible to enable CRASH_CORE on any architecture, which is needed for patch 9. Patch 9 adds vmcoreinfo to /proc/kcore. This patch (of 9): kclist_add() is only called at init time, so there's no point in grabbing any locks. We're also going to replace the rwlock with a rwsem, which we don't want to try grabbing during early boot. While we're here, mark kclist_add() with __init so that we'll get a warning if it's called from non-init code. Link: http://lkml.kernel.org/r/98208db1faf167aa8b08eebfa968d95c70527739.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: spread "const" a bitAlexey Dobriyan1-1/+1
Link: http://lkml.kernel.org/r/20180627200614.GB18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: fix comment for NODEMASK_ALLOCOscar Salvador1-1/+1
Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when NODES_SHIFT is higher than 8, otherwise it declares it within the stack. The comment says that the reasoning behind this, is that nodemask_t will be 256 bytes when NODES_SHIFT is higher than 8, but this is not true. For example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t. Let us fix up the comment for that. Another thing is that it might make sense to let values lower than 128bytes be allocated in the stack. Although this all depends on the depth of the stack (and this changes from function to function), I think that 64 bytes is something we can easily afford. So we could even bump the limit by 1 (from > 8 to > 9). Link: http://lkml.kernel.org/r/20180820085516.9687-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22/proc/meminfo: add percpu populated pages countDennis Zhou (Facebook)1-0/+2
Currently, percpu memory only exposes allocation and utilization information via debugfs. This more or less is only really useful for understanding the fragmentation and allocation information at a per-chunk level with a few global counters. This is also gated behind a config. BPF and cgroup, for example, have seen an increase in use causing increased use of percpu memory. Let's make it easier for someone to identify how much memory is being used. This patch adds the "Percpu" stat to meminfo to more easily look up how much percpu memory is in use. This number includes the cost for all allocated backing pages and not just insight at the per a unit, per chunk level. Metadata is excluded. I think excluding metadata is fair because the backing memory scales with the numbere of cpus and can quickly outweigh the metadata. It also makes this calculation light. Link: http://lkml.kernel.org/r/20180807184723.74919-1-dennisszhou@gmail.com Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm, oom: introduce memory.oom.groupRoman Gushchin1-0/+18
For some workloads an intervention from the OOM killer can be painful. Killing a random task can bring the workload into an inconsistent state. Historically, there are two common solutions for this problem: 1) enabling panic_on_oom, 2) using a userspace daemon to monitor OOMs and kill all outstanding processes. Both approaches have their downsides: rebooting on each OOM is an obvious waste of capacity, and handling all in userspace is tricky and requires a userspace agent, which will monitor all cgroups for OOMs. In most cases an in-kernel after-OOM cleaning-up mechanism can eliminate the necessity of enabling panic_on_oom. Also, it can simplify the cgroup management for userspace applications. This commit introduces a new knob for cgroup v2 memory controller: memory.oom.group. The knob determines whether the cgroup should be treated as an indivisible workload by the OOM killer. If set, all tasks belonging to the cgroup or to its descendants (if the memory cgroup is not a leaf cgroup) are killed together or not at all. To determine which cgroup has to be killed, we do traverse the cgroup hierarchy from the victim task's cgroup up to the OOMing cgroup (or root) and looking for the highest-level cgroup with memory.oom.group set. Tasks with the OOM protection (oom_score_adj set to -1000) are treated as an exception and are never killed. This patch doesn't change the OOM victim selection algorithm. Link: http://lkml.kernel.org/r/20180802003201.817-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm/page_alloc: Introduce free_area_init_core_hotplugOscar Salvador2-1/+2
Currently, whenever a new node is created/re-used from the memhotplug path, we call free_area_init_node()->free_area_init_core(). But there is some code that we do not really need to run when we are coming from such path. free_area_init_core() performs the following actions: 1) Initializes pgdat internals, such as spinlock, waitqueues and more. 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on when creating hash tables. 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages. 4) Initializes some fields of the zone structure data 5) Calls init_currently_empty_zone to initialize all the freelists 6) Calls memmap_init to initialize all pages belonging to certain zone When called from memhotplug path, free_area_init_core() only performs actions #1 and #4. Action #2 is pointless as the zones do not have any pages since either the node was freed, or we are re-using it, eitherway all zones belonging to this node should have 0 pages. For the same reason, action #3 results always in manages_pages being 0. Action #5 and #6 are performed later on when onlining the pages: online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone() online_pages()->move_pfn_range_to_zone()->memmap_init_zone() This patch does two things: First, moves the node/zone initializtion to their own function, so it allows us to create a small version of free_area_init_core, where we only perform: 1) Initialization of pgdat internals, such as spinlock, waitqueues and more 4) Initialization of some fields of the zone structure data These two functions are: pgdat_init_internals() and zone_init_internals(). The second thing this patch does, is to introduce free_area_init_core_hotplug(), the memhotplug version of free_area_init_core(): Currently, we call free_area_init_node() from the memhotplug path. In there, we set some pgdat's fields, and call calculate_node_totalpages(). calculate_node_totalpages() calculates the # of pages the node has. Since the node is either new, or we are re-using it, the zones belonging to this node should not have any pages, so there is no point to calculate this now. Actually, we re-set these values to 0 later on with the calls to: reset_node_managed_pages() reset_node_present_pages() The # of pages per node and the # of pages per zone will be calculated when onlining the pages: online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range() online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range() Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace __paginginit with __init, so their code gets freed up. [osalvador@techadventures.net: fix section usage] Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net [osalvador@suse.de: v6] Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: access zone->node via zone_to_nid() and zone_set_nid()Pavel Tatashin2-15/+20
zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to have inline functions to access this field in order to avoid ifdef's in c files. Link: http://lkml.kernel.org/r/20180730101757.28058-3-osalvador@techadventures.net Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: remove zone_id() and make use of zone_idx() in is_dev_zone()Oscar Salvador1-19/+12
is_dev_zone() is using zone_id() to check if the zone is ZONE_DEVICE. zone_id() looks pretty much the same as zone_idx(), and while the use of zone_idx() is quite spread in the kernel, zone_id() is only being used by is_dev_zone(). This patch removes zone_id() and makes is_dev_zone() use zone_idx() to check the zone, so we do not have two things with the same functionality around. Link: http://lkml.kernel.org/r/20180730133718.28683-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: zero out the vma in vma_init()Andrew Morton1-0/+1
Rather than in vm_area_alloc(). To ensure that the various oddball stack-based vmas are in a good state. Some of the callers were zeroing them out, others were not. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm, oom: distinguish blockable mode for mmu notifiersMichal Hocko4-12/+30
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm, swap, get_swap_pages: use entry_size instead of cluster in parameterHuang Ying1-1/+1
As suggested by Matthew Wilcox, it is better to use "int entry_size" instead of "bool cluster" as parameter to specify whether to operate for huge or normal swap entries. Because this improve the flexibility to support other swap entry size. And Dave Hansen thinks that this improves code readability too. So in this patch, the "bool cluster" parameter of get_swap_pages() is replaced by "int entry_size". And nr_swap_entries() trick is used to reduce the binary size when !CONFIG_TRANSPARENT_HUGE_PAGE. text data bss dec hex filename base 24215 2028 340 26583 67d7 mm/swapfile.o head 24123 2004 340 26467 6763 mm/swapfile.o Link: http://lkml.kernel.org/r/20180720071845.17920-7-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: struct shrinker: make flags of unsigned typeKirill Tkhai1-2/+2
Currently, there are two flags only, so unsigned is more then enough. Also, move int seeks to keep these fields together. Link: http://lkml.kernel.org/r/153199748720.21131.6476256940113102483.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: struct shrink_control: keep int fields togetherKirill Tkhai1-3/+3
Patch series "Reorderings in struct shrinker and struct shrink_control". These structures are intensively used during reclaim and, displace other data in cache, so there is no a reason they have int fields not grouped together. This patch (of 2): gfp_t is of unsigned type, so let's move nid to keep them together. Link: http://lkml.kernel.org/r/153199747930.21131.861043607301997810.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-20Merge tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds9-43/+71
Pull tracing updates from Steven Rostedt: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockdep and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix SPDX format headers to use C++ style comments tracing: Add SPDX License format tags to tracing files tracing: Add SPDX License format to bpf_trace.c blktrace: Add SPDX License format header s390/ftrace: Add -mfentry and -mnop-mcount support tracing: Add -mcount-nop option support tracing: Avoid calling cc-option -mrecord-mcount for every Makefile tracing: Handle CC_FLAGS_FTRACE more accurately Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Simplify uprobe_register() body tracepoints: Free early tracepoints after RCU is initialized uprobes: Use synchronize_rcu() not synchronize_sched() tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() ftrace: Remove unused pointer ftrace_swapper_pid tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing/irqsoff: Handle preempt_count for different configs tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing: irqsoff: Account for additional preempt_disable trace: Use rcu_dereference_raw for hooks from trace-event subsystem tracing/kprobes: Fix within_notrace_func() to check only notrace functions ...
2018-08-20Merge tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds7-19/+36
Pull ceph updates from Ilya Dryomov: "The main things are support for cephx v2 authentication protocol and basic support for rbd images within namespaces (myself). Also included are y2038 conversion patches from Arnd, a pile of miscellaneous fixes from Chengguang and Zheng's feature bit infrastructure for the filesystem" * tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-client: (40 commits) ceph: don't drop message if it contains more data than expected ceph: support cephfs' own feature bits crush: fix using plain integer as NULL warning libceph: remove unnecessary non NULL check for request_key ceph: refactor error handling code in ceph_reserve_caps() ceph: refactor ceph_unreserve_caps() ceph: change to void return type for __do_request() ceph: compare fsc->max_file_size and inode->i_size for max file size limit ceph: add additional size check in ceph_setattr() ceph: add additional offset check in ceph_write_iter() ceph: add additional range check in ceph_fallocate() ceph: add new field max_file_size in ceph_fs_client libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() libceph: check authorizer reply/challenge length before reading libceph: implement CEPHX_V2 calculation mode libceph: add authorizer challenge libceph: factor out encrypt_authorizer() libceph: factor out __ceph_x_decrypt() libceph: factor out __prepare_write_connect() libceph: store ceph_auth_handshake pointer in ceph_connection ...
2018-08-20Merge tag 'rtc-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linuxLinus Torvalds1-17/+4
Pull RTC updates from Alexandre Belloni: "It is now possible to add custom sysfs attributes while avoiding a possible race condition. Unused code has been removed resulting in a nice reduction of the code base. And more drivers have been switched to SPDX by their maintainers. Summary: Subsystem: - new helpers to add custom sysfs attributes - struct rtc_task removal along with rtc_irq_[un]register() - rtc_irq_set_state and rtc_irq_set_freq are not exported anymore Drivers: - armada38x: reset after rtc power loss - ds1307: now supports m41t11 - isl1208: now supports isl1219 and tamper detection - pcf2127: internal SRAM support" * tag 'rtc-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits) rtc: ds1307: simplify hwmon config rtc: s5m: Add SPDX license identifier rtc: maxim: Add SPDX license identifiers rtc: isl1219: add device tree documentation rtc: isl1208: set ev-evienb bit from device tree rtc: isl1208: Add "evdet" interrupt source for isl1219 rtc: isl1208: add support for isl1219 with tamper detection rtc: sysfs: facilitate attribute add to rtc device rtc: remove struct rtc_task char: rtc: remove task handling rtc: pcf85063: preserve control register value between stop and start rtc: sh: remove unused variable rtc_dev rtc: unexport rtc_irq_set_* rtc: simplify rtc_irq_set_state/rtc_irq_set_freq rtc: remove irq_task and irq_task_lock rtc: remove rtc_irq_register/rtc_irq_unregister rtc: sh: remove dead code rtc: sa1100: don't set PIE frequency rtc: ds1307: support m41t11 variant rtc: ds1307: fix data pointer to m41t0 ...
2018-08-20Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hidLinus Torvalds3-12/+21
Pull HID updates from Jiri Kosina: - touch_max detection improvements and quirk handling fixes in wacom driver from Jason Gerecke and Ping Cheng - Palm rejection from Dmitry Torokhov and _dial support from Benjamin Tissoires for hid-multitouch driver - Low voltage support for i2c-hid driver from Stephen Boyd - Guitar-Hero support from Nicolas Adenis-Lamarre - other assorted small fixes and device ID additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (40 commits) HID: intel_ish-hid: tx_buf memory leak on probe/remove HID: intel-ish-hid: Prevent loading of driver on Mehlow HID: cougar: Add support for the Cougar 500k Gaming Keyboard HID: cougar: make compare_device_paths reusable HID: intel-ish-hid: remove redundant variable num_frags HID: multitouch: handle palm for touchscreens HID: multitouch: touchscreens also use confidence reports HID: multitouch: report MT_TOOL_PALM for non-confident touches HID: microsoft: support the Surface Dial HID: core: do not upper bound the collection stack HID: input: enable Totem on the Dell Canvas 27 HID: multitouch: remove one copy of values HID: multitouch: ditch mt_report_id HID: multitouch: store a per application quirks value HID: multitouch: Store per collection multitouch data HID: multitouch: make sure the static list of class is not changed input: add MT_TOOL_DIAL HID: elan: Add support for touchpad on the Toshiba Click Mini L9W HID: elan: Add USB-id for HP x2 10-n000nd touchpad HID: elan: Add a flag for selecting if the touchpad has a LED ...
2018-08-20Merge tag 'backlight-next-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlightLinus Torvalds1-1/+0
Pull backlight updates from Lee Jones: "Core Framework: - Remove unused/obsolete code/comments New Functionality: - Allow less granular brightness specification for high-res PWMs; pwm_bl - Align brightness {inc,dec}rements with that perceived by the human-eye; pwm_bl Fix-ups: - Prepare for the introduction of -Wimplicit-fall-through; adp8860_bl Bug Fixes: - Fix uninitialised variable; pwm_bl" * tag 'backlight-next-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: pwm_bl: Fix uninitialized variable backlight: adp8860: Mark expected switch fall-through backlight: Remove obsolete comment for ->state dt-bindings: pwm-backlight: Move brightness-levels to optional backlight: pwm_bl: Compute brightness of LED linearly to human eye dt-bindings: pwm-backlight: Add a num-interpolation-steps property backlight: pwm_bl: Linear interpolation between brightness-levels
2018-08-20Merge tag 'mfd-next-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds11-17/+4812
Pull MFD updates from Lee Jones: "New Drivers: - Add Cirrus Logic Madera Codec (CS47L35, CS47L85 and CS47L90/91) driver - Add ChromeOS EC CEC driver - Add ROHM BD71837 PMIC driver New Device Support: - Add support for Dialog Semi DA9063L PMIC variant to DA9063 - Add support for Intel Ice Lake to Intel-PLSS-PCI - Add support for X-Powers AXP806 to AXP20x New Functionality: - Add support for USB Charging to the ChromeOS Embedded Controller - Add support for HDMI CEC to the ChromeOS Embedded Controller - Add support for HDMI CEC to Intel HDMI - Add support for accessory detection to Madera devices - Allow individual pins to be configured via DT' wlf,csnaddr-pd - Provide legacy platform specific EEPROM/Watchdog commands; rave-sp Fix-upsL - Trivial renaming/spelling fixes; cros_ec, da9063-* - Convert to Managed Resources (devm_*); da9063-*, ti_am335x_tscadc - Transition to helper macros/functions; da9063-* - Constify; kempld-core - Improve error path/messages; wm8994-core - Disable IRQs locally instead of relying on USB subsystem; dln2 - Remove unused code; rave-sp - New exports; sec-core Bug Fixes: - Fix possible false I2C transaction error; arizona-core - Fix declared memory area size; hi655x-pmic - Fix checksum type; rave-sp - Fix incorrect default serial port configuration: rave-sp - Fix incorrect coherent DMA mask for sub-devices; sm501" * tag 'mfd-next-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (60 commits) mfd: madera: Add register definitions for accessory detect mfd: sm501: Set coherent_dma_mask when creating subdevices mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC mfd: bd71837: Core driver for ROHM BD71837 PMIC media: platform: cros-ec-cec: Fix dependency on MFD_CROS_EC mfd: sec-core: Export OF module alias table mfd: as3722: Disable auto-power-on when AC OK mfd: axp20x: Support AXP806 in I2C mode mfd: axp20x: Add self-working mode support for AXP806 dt-bindings: mfd: axp20x: Add "self-working" mode for AXP806 mfd: wm8994: Allow to configure CS/ADDR Pulldown from dts mfd: wm8994: Allow to configure Speaker Mode Pullup from dts mfd: rave-sp: Emulate CMD_GET_STATUS on device that don't support it mfd: rave-sp: Add legacy watchdog ping command translation mfd: rave-sp: Add legacy EEPROM access command translation mfd: rave-sp: Initialize flow control and parity of the port mfd: rave-sp: Fix incorrectly specified checksum type mfd: rave-sp: Remove unused defines mfd: hi655x: Fix regmap area declared size for hi655x mfd: ti_am335x_tscadc: Fix struct clk memory leak ...
2018-08-20Raise the minimum required gcc version to 4.6Joe Perches1-68/+18
Various architectures fail to build properly with older versions of the gcc compiler. An example from Guenter Roeck in thread [1]: > > In file included from ./include/linux/mm.h:17:0, > from ./include/linux/pid_namespace.h:7, > from ./include/linux/ptrace.h:10, > from arch/openrisc/kernel/asm-offsets.c:32: > ./include/linux/mm_types.h:497:16: error: flexible array member in otherwise empty struct > > This is just an example with gcc 4.5.1 for or32. I have seen the problem > with gcc 4.4 (for unicore32) as well. So update the minimum required version of gcc to 4.6. [1] https://lore.kernel.org/lkml/20180814170904.GA12768@roeck-us.net/ Miscellanea: - Update Documentation/process/changes.rst - Remove and consolidate version test blocks in compiler-gcc.h for versions lower than 4.6 Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-20Merge branch 'for-4.19/multitouch-multiaxis' into for-linusJiri Kosina2-8/+16
Multitouch updates: - Dial support - Palm rejection for touchscreens - a few small assorted fixes
2018-08-20Merge branch 'for-4.19/i2c-hid' into for-linusJiri Kosina1-4/+3
Low voltage support for i2c-hid
2018-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds8-12/+32
Pull networking fixes from David Miller: 1) Fix races in IPVS, from Tan Hu. 2) Missing unbind in matchall classifier, from Hangbin Liu. 3) Missing act_ife action release, from Vlad Buslov. 4) Cure lockdep splats in ila, from Cong Wang. 5) veth queue leak on link delete, from Toshiaki Makita. 6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From Kees Cook. 7) RCU usage fixup in XDP, from Tariq Toukan. 8) Two TCP ULP fixes from Daniel Borkmann. 9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner Kallweit. 10) Always take tcf_lock with BH disabled, otherwise we can deadlock with rate estimator code paths. From Vlad Buslov. 11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly. From Jian-Hong Pan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) ip6_vti: fix creating fallback tunnel device for vti6 ip_vti: fix a null pointer deferrence when create vti fallback tunnel r8169: don't use MSI-X on RTL8106e net: lan743x_ptp: convert to ktime_get_clocktai_ts64 net: sched: always disable bh when taking tcf_lock ip6_vti: simplify stats handling in vti6_xmit bpf: fix redirect to map under tail calls r8169: add missing Kconfig dependency tools/bpf: fix bpf selftest test_cgroup_storage failure bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist bpf, sockmap: fix map elem deletion race with smap_stop_sock bpf, sockmap: fix leakage of smap_psock_map_entry tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach tcp, ulp: add alias for all ulp modules bpf: fix a rcu usage warning in bpf_prog_array_copy_core() samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM net/xdp: Fix suspicious RCU usage warning net/mlx5e: Delete unneeded function argument Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb isdn: Disable IIOCDBGVAR ...
2018-08-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-5/+25
Pull first set of KVM updates from Paolo Bonzini: "PPC: - minor code cleanups x86: - PCID emulation and CR3 caching for shadow page tables - nested VMX live migration - nested VMCS shadowing - optimized IPI hypercall - some optimizations ARM will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits) kvm: x86: Set highest physical address bits in non-present/reserved SPTEs KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c KVM: X86: Implement PV IPIs in linux guest KVM: X86: Add kvm hypervisor init time platform setup callback KVM: X86: Implement "send IPI" hypercall KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs() KVM: x86: Skip pae_root shadow allocation if tdp enabled KVM/MMU: Combine flushing remote tlb in mmu_set_spte() KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup KVM: vmx: move struct host_state usage to struct loaded_vmcs KVM: vmx: compute need to reload FS/GS/LDT on demand KVM: nVMX: remove a misleading comment regarding vmcs02 fields KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state() KVM: vmx: add dedicated utility to access guest's kernel_gs_base KVM: vmx: track host_state.loaded using a loaded_vmcs pointer KVM: vmx: refactor segmentation code in vmx_save_host_state() kvm: nVMX: Fix fault priority for VMX operations kvm: nVMX: Fix fault vector for VMX operation at CPL > 0 ...
2018-08-19Merge tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linuxLinus Torvalds1-0/+1
Pull RISC-V updates from Palmer Dabbelt: "This contains some major improvements to the RISC-V port, including the necessary interrupt controller and timer support to actually make it to userspace. Support for three devices has been added: - the ISA-mandated timers on RISC-V systems. - the ISA-mandated first-level interrupt controller on RISC-V systems, which is handled as part of our core arch code because it's very small and tightly tied to the ISA. - SiFive's platform-level interrupt controller, which talks to the actual devices. In addition to these new devices, there are a handful of cleanups all over the RISC-V tree: - build fixes for various configurations: * A fix to the vDSO build's makefile so it respects CFLAGS. * The addition of __lshrti3, a libgcc derived function necessary for some 32-bit configurations. * !SMP && PERF_EVENTS - Cleanups to the arch code to remove the remnants of old versions of the drivers that were just properly submitted. * Some dead code from the timer driver, most of which wasn't ever even compiled. * Cleanups of some interrupt #defines, which are now local to the interrupt handling code. - Fixes to ptrace(), which while not being sufficient to fully make GDB work are at least sufficient to get simple GDB tasks to work. - Early printk support via RISC-V's architecturally mandated SBI console device. - A fix to our early debug trap handler to ensure it's always aligned. These patches have all been through a fairly extensive review process, but as this enables a whole pile of functionality (ie, userspace) I'm confident we'll need to submit a few more patches. The only concrete issues I know about are the sys_riscv_flush_icache patches, but as I managed to screw those up on Friday I figured it'd be best to let them bake another week. This tag boots a Fedora root filesystem on QEMU's master branch for me, and before this morning's rebase (from 4.18-rc8 to 4.18) it booted on the HiFive Unleashed. Thanks to Christoph Hellwig and the other guys at WD for getting the new drivers in shape!" * tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: dt-bindings: interrupt-controller: SiFive Plaform Level Interrupt Controller dt-bindings: interrupt-controller: RISC-V local interrupt controller RISC-V: Fix !CONFIG_SMP compilation error irqchip: add a SiFive PLIC driver RISC-V: Add the directive for alignment of stvec's value clocksource: new RISC-V SBI timer driver RISC-V: implement low-level interrupt handling RISC-V: add a definition for the SIE SEIE bit RISC-V: remove INTERRUPT_CAUSE_* defines from asm/irq.h RISC-V: simplify software interrupt / IPI code RISC-V: remove timer leftovers RISC-V: Add early printk support via the SBI console RISC-V: Don't increment sepc after breakpoint. RISC-V: implement __lshrti3. RISC-V: Use KBUILD_CFLAGS instead of KCFLAGS when building the vDSO
2018-08-18Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds7-33/+81
Pull input updates from Dmitry Torokhov: - a new driver for Rohm BU21029 touch controller - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free - updates to Atmel, eeti. pxrc and iforce drivers - assorted driver cleanups and fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits) MAINTAINERS: Add PhoenixRC Flight Controller Adapter Input: do not use WARN() in input_alloc_absinfo() Input: mark expected switch fall-throughs Input: raydium_i2c_ts - use true and false for boolean values Input: evdev - switch to bitmap API Input: gpio-keys - switch to bitmap_zalloc() Input: elan_i2c_smbus - cast sizeof to int for comparison bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free() md: Avoid namespace collision with bitmap API dm: Avoid namespace collision with bitmap API Input: pm8941-pwrkey - add resin entry Input: pm8941-pwrkey - abstract register offsets and event code Input: iforce - reorganize joystick configuration lists Input: atmel_mxt_ts - move completion to after config crc is updated Input: atmel_mxt_ts - don't report zero pressure from T9 Input: atmel_mxt_ts - zero terminate config firmware file Input: atmel_mxt_ts - refactor config update code to add context struct Input: atmel_mxt_ts - config CRC may start at T71 Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE ...
2018-08-18Merge tag 'hwlock-v4.19' of git://github.com/andersson/remoteprocLinus Torvalds1-1/+36
Pull hwspinlock updates from Bjorn Andersson: "This introduces devres helpers and an API to request a lock by name, then migrates the sprd SPI driver to use these" * tag 'hwlock-v4.19' of git://github.com/andersson/remoteproc: hwspinlock: Fix incorrect return pointers spi: sprd: Change to use devm_hwspin_lock_request_specific() spi: sprd: Replace of_hwspin_lock_get_id() with of_hwspin_lock_get_id_byname() hwspinlock: Fix one comment mistake hwspinlock: Remove redundant config hwspinlock: Add devm_xxx() APIs to register/unregister one hwlock controller hwspinlock: Add devm_xxx() APIs to request/free hwlock hwspinlock: Add one new API to support getting a specific hwlock by the name
2018-08-18Merge tag 'rproc-v4.19' of git://github.com/andersson/remoteprocLinus Torvalds2-9/+14
Pull remoteproc updates from Bjorn Andersson: "This adds support for pre-start and post-shutdown hooks for remoteproc subdevices, refactors the Qualcomm Hexagon support to allow reuse between several drivers, makes authentication in the MDT file loader optional, migrates a few format strings to use %pK and migrates the Davinci driver to use the reset framework" * tag 'rproc-v4.19' of git://github.com/andersson/remoteproc: remoteproc/davinci: use the reset framework remoteproc/davinci: Mark error recovery as disabled remoteproc: st_slim: replace "%p" with "%pK" remoteproc: replace "%p" with "%pK" remoteproc: qcom: fix Q6V5_WCSS dependencies remoteproc: Reset table_ptr in rproc_start() failure paths remoteproc: qcom: q6v5-pil: fix modem hang on SDM845 after axis2 clk unvote remoteproc: qcom q6v5: fix modular build remoteproc: Introduce prepare and unprepare for subdevices remoteproc: rename subdev probe and remove functions remoteproc: Make client initialize ops in rproc_subdev remoteproc: Make start and stop in subdev optional remoteproc: Rename subdev functions to start/stop remoteproc: qcom: Introduce Hexagon V5 based WCSS driver remoteproc: qcom: q6v5-pil: Use common q6v5 helpers remoteproc: qcom: adsp: Use common q6v5 helpers remoteproc: q6v5: Extract common resource handling remoteproc: qcom: mdt_loader: Make the firmware authentication optional
2018-08-18Merge tag 'dmaengine-4.19-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds2-1/+7
Pull DMAengine updates from Vinod Koul: "This round brings couple of framework changes, a new driver and usual driver updates: - new managed helper for dmaengine framework registration - split dmaengine pause capability to pause and resume and allow drivers to report that individually - update dma_request_chan_by_mask() to handle deferred probing - move imx-sdma to use virt-dma - new driver for Actions Semi Owl family S900 controller - minor updates to intel, renesas, mv_xor, pl330 etc" * tag 'dmaengine-4.19-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (46 commits) dmaengine: Add Actions Semi Owl family S900 DMA driver dt-bindings: dmaengine: Add binding for Actions Semi Owl SoCs dmaengine: sh: rcar-dmac: Should not stop the DMAC by rcar_dmac_sync_tcr() dmaengine: mic_x100_dma: use the new helper to simplify the code dmaengine: add a new helper dmaenginem_async_device_register dmaengine: imx-sdma: add memcpy interface dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff' dmaengine: dma_request_chan_by_mask() to handle deferred probing dmaengine: pl330: fix irq race with terminate_all dmaengine: Revert "dmaengine: mv_xor_v2: enable COMPILE_TEST" dmaengine: mv_xor_v2: use {lower,upper}_32_bits to configure HW descriptor address dmaengine: mv_xor_v2: enable COMPILE_TEST dmaengine: mv_xor_v2: move unmap to before callback dmaengine: mv_xor_v2: convert callback to helper function dmaengine: mv_xor_v2: kill the tasklets upon exit dmaengine: mv_xor_v2: explicitly freeup irq dmaengine: sh: rcar-dmac: Add dma_pause operation dmaengine: sh: rcar-dmac: add a new function to clear CHCR.DE with barrier dmaengine: idma64: Support dmaengine_terminate_sync() dmaengine: hsu: Support dmaengine_terminate_sync() ...
2018-08-18Merge tag 'mmc-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds4-6/+11
Pull MMC updates from Ulf Hansson: "Updates for MMC for v4.19. MMC core: - Add some fine-grained hooks to further support HS400 tuning - Improve error path for bus width setting for HS400es - Use a common method when checking R1 status MMC host: - renesas_sdhi: Add r8a77990 support - renesas_sdhi: Add eMMC HS400 mode support - tmio/renesas_sdhi: Improve tuning/clock management - tmio: Add eMMC HS400 mode support - sunxi: Add support for 3.3V eMMC DDR mode - mmci: Initial support to manage variant specific callbacks - sdhci: Don't try 3.3V I/O voltage if not supported - sdhci-pci-dwc-mshc: Add driver to support Synopsys dwc mshc SDHCI PCI - sdhci-of-dwcmshc: Add driver to support Synopsys DWC MSHC SDHCI - sdhci-msm: Add support for new version sdcc V5 - sdhci-pci-o2micro: Add support for O2 eMMC HS200 mode - sdhci-pci-o2micro: Add support for O2 hardware tuning - sdhci-pci-o2micro: Add MSI interrupt support for O2 SD host - sdhci-pci: Add support for Intel ICP - sdhci-tegra: Prevent ACMD23 and HS200 mode on Tegra 3 - sdhci-tegra: Fix eMMC DDR52 mode - sdhci-tegra: Improve clock management - dw_mmc-rockchip: Document compatible string for px30 - sdhci-esdhc-imx: Add support for 3.3V eMMC DDR mode - sdhci-of-esdhc: Set proper DMA mask for ls104x chips - sdhci-of-esdhc: Improve clock management - sdhci-of-arasan: Add a quirk to manage unstable clocks - dw_mmc-exynos: Address potential external abort during system resume - pxamci: Add support for common MMC DT bindings - pxamci: Several cleanups and improvements - pxamci: Merge immutable branch for pxa to switch to DMA slave maps" * tag 'mmc-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (56 commits) mmc: core: improve reasonableness of bus width setting for HS400es mmc: tmio: remove unneeded variable in tmio_mmc_start_command() mmc: renesas_sdhi: Fix sampling clock position selecting mmc: tmio: Fix tuning flow mmc: sunxi: remove output of virtual base address dt-bindings: mmc: rockchip-dw-mshc: add description for px30 mmc: renesas_sdhi: Add r8a77990 support mmc: sunxi: allow 3.3V DDR when DDR is available mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml mmc: mmci: Initial support to manage variant specific callbacks mmc: tegra: Force correct divider calculation on DDR50/52 mmc: sdhci: Add MSI interrupt support for O2 SD host mmc: sdhci: Add support for O2 hardware tuning mmc: sdhci: Export sdhci tuning function symbol mmc: sdhci: Change O2 Host HS200 mode clock frequency to 200MHz mmc: sdhci: Add support for O2 eMMC HS200 mode mmc: tegra: Add and use tegra_sdhci_get_max_clock() mmc: sdhci-esdhc-imx: fix indent mmc: sdhci-esdhc-imx: disable clocks before changing frequency mmc: tegra: prevent ACMD23 on Tegra 3 ...
2018-08-18pcmcia: remove long deprecated pcmcia_request_exclusive_irq() functionLinus Torvalds1-10/+0
This function was created as a deprecated fallback case back in 2010 by commit eb14120f743d ("pcmcia: re-work pcmcia_request_irq()") for legacy cases. Actual in-kernel users haven't been around for a long while. The last in-kernel user was apparently removed four years ago by commit 5f5316fcd08e ("am2150: Update nmclan_cs.c to use update PCMCIA API"). Just remove it entirely. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18deprecate the '__deprecated' attribute warnings entirely and for goodLinus Torvalds2-20/+2
We haven't had lots of deprecation warnings lately, but the rdma use of it made them flare up again. They are not useful. They annoy everybody, and nobody ever does anything about them, because it's always "somebody elses problem". And when people start thinking that warnings are normal, they stop looking at them, and the real warnings that mean something go unnoticed. If you want to get rid of a function, just get rid of it. Convert every user to the new world order. And if you can't do that, then don't annoy everybody else with your marking that says "I couldn't be bothered to fix this, so I'll just spam everybody elses build logs with warnings about my laziness". Make a kernelnewbies wiki page about things that could be cleaned up, write a blog post about it, or talk to people on the mailing lists. But don't add warnings to the kernel build about cleanup that you think should happen but you aren't doing yourself. Don't. Just don't. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18Merge tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds5-54/+80
Pull driver core updates from Greg KH: "Here are all of the driver core and related patches for 4.19-rc1. Nothing huge here, just a number of small cleanups and the ability to now stop the deferred probing after init happens. All of these have been in linux-next for a while with only a merge issue reported" * tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits) base: core: Remove WARN_ON from link dependencies check drivers/base: stop new probing during shutdown drivers: core: Remove glue dirs from sysfs earlier driver core: remove unnecessary function extern declare sysfs.h: fix non-kernel-doc comment PM / Domains: Stop deferring probe at the end of initcall iommu: Remove IOMMU_OF_DECLARE iommu: Stop deferring probe at end of initcalls pinctrl: Support stopping deferred probe after initcalls dt-bindings: pinctrl: add a 'pinctrl-use-default' property driver core: allow stopping deferred probe after init driver core: add a debugfs entry to show deferred devices sysfs: Fix internal_create_group() for named group updates base: fix order of OF initialization linux/device.h: fix kernel-doc notation warning Documentation: update firmware loader fallback reference kobject: Replace strncpy with memcpy drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number kernfs: Replace strncpy with memcpy device: Add #define dev_fmt similar to #define pr_fmt ...
2018-08-18Merge tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds16-55/+769
Pull char/misc driver updates from Greg KH: "Here is the bit set of char/misc drivers for 4.19-rc1 There is a lot here, much more than normal, seems like everyone is writing new driver subsystems these days... Anyway, major things here are: - new FSI driver subsystem, yet-another-powerpc low-level hardware bus - gnss, finally an in-kernel GPS subsystem to try to tame all of the crazy out-of-tree drivers that have been floating around for years, combined with some really hacky userspace implementations. This is only for GNSS receivers, but you have to start somewhere, and this is great to see. Other than that, there are new slimbus drivers, new coresight drivers, new fpga drivers, and loads of DT bindings for all of these and existing drivers. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits) android: binder: Rate-limit debug and userspace triggered err msgs fsi: sbefifo: Bump max command length fsi: scom: Fix NULL dereference misc: mic: SCIF Fix scif_get_new_port() error handling misc: cxl: changed asterisk position genwqe: card_base: Use true and false for boolean values misc: eeprom: assignment outside the if statement uio: potential double frees if __uio_register_device() fails eeprom: idt_89hpesx: clean up an error pointer vs NULL inconsistency misc: ti-st: Fix memory leak in the error path of probe() android: binder: Show extra_buffers_size in trace firmware: vpd: Fix section enabled flag on vpd_section_destroy platform: goldfish: Retire pdev_bus goldfish: Use dedicated macros instead of manual bit shifting goldfish: Add missing includes to goldfish.h mux: adgs1408: new driver for Analog Devices ADGS1408/1409 mux dt-bindings: mux: add adi,adgs1408 Drivers: hv: vmbus: Cleanup synic memory free path Drivers: hv: vmbus: Remove use of slow_virt_to_phys() Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind() ...
2018-08-18Merge tag 'staging-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds2-0/+19
Pull staging and IIO updates from Greg KH: "Here are the big staging/iio patches for 4.19-rc1. Lots of churn here, with tons of cleanups happening in staging drivers, a removal of an old crypto driver that no one was using (skein), and the addition of some new IIO drivers. Also added was a "gasket" driver from Google that needs loads of work and the erofs filesystem. Even with adding all of the new drivers and a new filesystem, we are only adding about 1000 lines overall to the kernel linecount, which shows just how much cleanup happened, and how big the unused crypto driver was. All of these have been in the linux-next tree for a while now with no reported issues" * tag 'staging-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (903 commits) staging:rtl8192u: Remove unused macro definitions - Style staging:rtl8192u: Add spaces around '+' operator - Style staging:rtl8192u: Remove stale comment - Style staging: rtl8188eu: remove unused mp_custom_oid.h staging: fbtft: Add spaces around / - Style staging: fbtft: Erases some repetitive usage of function name - Style staging: fbtft: Adjust some empty-line problems - Style staging: fbtft: Removes one nesting level to help readability - Style staging: fbtft: Changes gamma table to define. staging: fbtft: A bit more information on dev_err. staging: fbtft: Fixes some alignment issues - Style staging: fbtft: Puts macro arguments in parenthesis to avoid precedence issues - Style staging: rtl8188eu: remove unused array dB_Invert_Table staging: rtl8188eu: remove whitespace, add missing blank line staging: rtl8188eu: use is_multicast_ether_addr in rtw_sta_mgt.c staging: rtl8188eu: remove whitespace - style staging: rtl8188eu: cleanup block comment - style staging: rtl8188eu: use is_multicast_ether_addr in rtl8188eu_xmit.c staging: rtl8188eu: use is_multicast_ether_addr in recv_linux.c staging: rtlwifi: refactor rtl_get_tcb_desc ...
2018-08-18Merge tag 'tty-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds6-5/+43
Pull tty/serial driver updates from Greg KH: "Here is the big tty and serial driver pull request for 4.19-rc1. It's not all that big, just a number of small serial driver updates and fixes, along with some better vt handling for unicode characters for those using braille terminals. All of these patches have been in linux-next for a long time with no reported issues" * tag 'tty-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (73 commits) tty: serial: 8250: Revert NXP SC16C2552 workaround serial: 8250_exar: Read INT0 from slave device, too tty: rocket: Fix possible buffer overwrite on register_PCI serial: 8250_dw: Add ACPI support for uart on Broadcom SoC serial: 8250_dw: always set baud rate in dw8250_set_termios dt-bindings: serial: Add binding for uartlite tty: serial: uartlite: Add support for suspend and resume tty: serial: uartlite: Add clock adaptation tty: serial: uartlite: Add structure for private data serial: sh-sci: Improve support for separate TEI and DRI interrupts serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE serial: sh-sci: Allow for compressed SCIF address serial: sh-sci: Improve interrupts description serial: 8250: Use cached port name directly in messages serial: 8250_exar: Drop unused variable in pci_xr17v35x_setup() vt: drop unused struct vt_struct vt: avoid a VLA in the unicode screen scroll function vt: add /dev/vcsu* to devices.txt vt: coherence validation code for the unicode screen buffer vt: selection: take screen contents from uniscr if available ...
2018-08-18Merge tag 'usb-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds11-49/+405
Pull USB/PHY updates from Greg KH: "Here is the big USB and phy driver patch set for 4.19-rc1. Nothing huge but there was a lot of work that happened this development cycle: - lots of type-c work, with drivers graduating out of staging, and displayport support being added. - new PHY drivers - the normal collection of gadget driver updates and fixes - code churn to work on the urb handling path, using irqsave() everywhere in anticipation of making this codepath a lot simpler in the future. - usbserial driver fixes and reworks - other misc changes All of these have been in linux-next with no reported issues for a while" * tag 'usb-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (159 commits) USB: serial: pl2303: add a new device id for ATEN usb: renesas_usbhs: Kconfig: convert to SPDX identifiers usb: dwc3: gadget: Check MaxPacketSize from descriptor usb: dwc2: Turn on uframe_sched on "stm32f4x9_fsotg" platforms usb: dwc2: Turn on uframe_sched on "amlogic" platforms usb: dwc2: Turn on uframe_sched on "his" platforms usb: dwc2: Turn on uframe_sched on "bcm" platforms usb: dwc2: gadget: ISOC's starting flow improvement usb: dwc2: Make dwc2_readl/writel functions endianness-agnostic. usb: dwc3: core: Enable AutoRetry feature in the controller usb: dwc3: Set default mode for dwc_usb31 usb: gadget: udc: renesas_usb3: Add register of usb role switch usb: dwc2: replace ioread32/iowrite32_rep with dwc2_readl/writel_rep usb: dwc2: Modify dwc2_readl/writel functions prototype usb: dwc3: pci: Intel Merrifield can be host usb: dwc3: pci: Supply device properties via driver data arm64: dts: dwc3: description of incr burst type usb: dwc3: Enable undefined length INCR burst type usb: dwc3: add global soc bus configuration reg0 usb: dwc3: Describe 'wakeup_work' field of struct dwc3_pci ...
2018-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-4/+8
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit restrictions, from Yonghong. 2) Fix a suspicious RCU rcu_dereference_check() warning triggered from removing a device's XDP memory allocator by using the correct rhashtable lookup function, from Tariq. 3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races as well as enforcing module aliases for ULPs. Another fix for BPF map redirect to make them work again with tail calls, from Daniel. 4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller4-5/+10
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Infinite loop in IPVS when net namespace is released, from Tan Hu. 2) Do not show negative timeouts in ip_vs_conn by using the new jiffies_delta_to_msecs(), patches from Matteo Croce. 3) Set F_IFACE flag for linklocal addresses in ip6t_rpfilter, from Florian Westphal. 4) Fix overflow in set size allocation, from Taehee Yoo. 5) Use netlink_dump_start() from ctnetlink to fix memleak from the error path, again from Florian. 6) Register nfnetlink_subsys in last place, otherwise netns init path may lose race and see net->nft uninitialized data. This also reverts previous attempt to fix this by increase netns refcount, patches from Florian. 7) Remove conntrack entries on layer 4 protocol tracker module removal, from Florian. 8) Use GFP_KERNEL_ACCOUNT for xtables blob allocation, from Michal Hocko. 9) Get tproxy documentation in sync with existing codebase, from Mate Eckl. 10) Honor preset layer 3 protocol via ctx->family in the new nft_ct timeout infrastructure, from Harsha Sharma. 11) Let uapi nfnetlink_osf.h compile standalone with no errors, from Dmitry V. Levin. 12) Missing braces compilation warning in nft_tproxy, patch from Mate Eclk. 13) Disregard bogus check to bail out on non-anonymous sets from the dynamic set update extension. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-17Merge tag '9p-for-4.19-2' of git://github.com/martinetd/linuxLinus Torvalds1-7/+4
Pull 9p updates from Dominique Martinet: "This contains mostly fixes (6 to be backported to stable) and a few changes, here is the breakdown: - rework how fids are attributed by replacing some custom tracking in a list by an idr - for packet-based transports (virtio/rdma) validate that the packet length matches what the header says - a few race condition fixes found by syzkaller - missing argument check when NULL device is passed in sys_mount - a few virtio fixes - some spelling and style fixes" * tag '9p-for-4.19-2' of git://github.com/martinetd/linux: (21 commits) net/9p/trans_virtio.c: add null terminal for mount tag 9p/virtio: fix off-by-one error in sg list bounds check 9p: fix whitespace issues 9p: fix multiple NULL-pointer-dereferences fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed 9p: validate PDU length net/9p/trans_fd.c: fix race by holding the lock net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree() net/9p/virtio: Fix hard lockup in req_done net/9p/trans_virtio.c: fix some spell mistakes in comments 9p/net: Fix zero-copy path in the 9p virtio transport 9p: Embed wait_queue_head into p9_req_t 9p: Replace the fidlist with an IDR 9p: Change p9_fid_create calling convention 9p: Fix comment on smp_wmb net/9p/client.c: version pointer uninitialized fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown" net/9p: fix error path of p9_virtio_probe 9p/net/protocol.c: return -ENOMEM when kmalloc() failed net/9p/client.c: add missing '\n' at the end of p9_debug() ...
2018-08-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-67/+209
Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ...
2018-08-17mm, vmacache: hash addresses based on pmdDavid Rientjes1-6/+0
When perf profiling a wide variety of different workloads, it was found that vmacache_find() had higher than expected cost: up to 0.08% of cpu utilization in some cases. This was found to rival other core VM functions such as alloc_pages_vma() with thp enabled and default mempolicy, and the conditionals in __get_vma_policy(). VMACACHE_HASH() determines which of the four per-task_struct slots a vma is cached for a particular address. This currently depends on the pfn, so pfn 5212 occupies a different vmacache slot than its neighboring pfn 5213. vmacache_find() iterates through all four of current's vmacache slots when looking up an address. Hashing based on pfn, an address has ~1/VMACACHE_SIZE chance of being cached in the first vmacache slot, or about 25%, *if* the vma is cached. This patch hashes an address by its pmd instead of pte to optimize for workloads with good spatial locality. This results in a higher probability of vmas being cached in the first slot that is checked: normally ~70% on the same workloads instead of 25%. [rientjes@google.com: various updates] Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807231532290.109445@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807091749150.114630@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17mm/list_lru: introduce list_lru_shrink_walk_irq()Sebastian Andrzej Siewior1-0/+25
Provide list_lru_shrink_walk_irq() and let it behave like list_lru_walk_one() except that it locks the spinlock with spin_lock_irq(). This is used by scan_shadow_nodes() because its lock nests within the i_pages lock which is acquired with IRQ. This change allows to use proper locking promitives instead hand crafted lock_irq_disable() plus spin_lock(). There is no EXPORT_SYMBOL provided because the current user is in-kernel only. Add list_lru_shrink_walk_irq() which acquires the spinlock with the proper locking primitives. Link: http://lkml.kernel.org/r/20180716111921.5365-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>