aboutsummaryrefslogtreecommitdiffstats
path: root/mm/oom_kill.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-10-05pipe: add pipe_buf_get() helperMiklos Szeredi3-3/+14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05relay: simplify relay_file_read()Al Viro1-58/+20
to hell with actors... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch default_file_splice_read() to use of pipe-backed iov_iterAl Viro1-71/+40
we only use iov_iter_get_pages_alloc() and iov_iter_advance() - pages are filled by kernel_readv() via a kvec array (as we used to do all along), so iov_iter here is used only as a way of arranging for those pages to be in pipe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch generic_file_splice_read() to use of ->read_iter()Al Viro16-605/+58
... and kill the ->splice_read() instances that can be switched to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05new iov_iter flavour: pipe-backedAl Viro4-6/+408
iov_iter variant for passing data into pipe. copy_to_iter() copies data into page(s) it has allocated and stuffs them into the pipe; copy_page_to_iter() stuffs there a reference to the page given to it. Both will try to coalesce if possible. iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages() and friends will do as copy_to_iter() would have and return the pages where the data would've been copied. iov_iter_advance() will truncate everything past the spot it has advanced to. New primitive: iov_iter_pipe(), used for initializing those. pipe should be locked all along. Running out of space acts as fault would for iovec-backed ones; in other words, giving it to ->read_iter() may result in short read if the pipe overflows, or -EFAULT if it happens with nothing copied there. In other words, ->read_iter() on those acts pretty much like ->splice_read(). Moreover, all generic_file_splice_read() users, as well as many other ->splice_read() instances can be switched to that scheme - that'll happen in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03fuse_dev_splice_read(): switch to add_to_pipe()Al Viro1-37/+9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03skb_splice_bits(): get rid of callbackAl Viro5-66/+6
since pipe_lock is the outermost now, we don't need to drop/regain socket locks around the call of splice_to_pipe() from skb_splice_bits(), which kills the need to have a socket-specific callback; we can just call splice_to_pipe() and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03new helper: add_to_pipe()Al Viro2-44/+64
single-buffer analogue of splice_to_pipe(); vmsplice_to_pipe() switched to that, leaving splice_to_pipe() only for ->splice_read() instances (and that only until they are converted as well). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice: lift pipe_lock out of splice_to_pipe()Al Viro2-73/+58
* splice_to_pipe() stops at pipe overflow and does *not* take pipe_lock * ->splice_read() instances do the same * vmsplice_to_pipe() and do_splice() (ultimate callers of splice_to_pipe()) arrange for waiting, looping, etc. themselves. That should make pipe_lock the outermost one. Unfortunately, existing rules for the amount passed by vmsplice_to_pipe() and do_splice() are quite ugly _and_ userland code can be easily broken by changing those. It's not even "no more than the maximal capacity of this pipe" - it's "once we'd fed pipe->nr_buffers pages into the pipe, leave instead of waiting". Considering how poorly these rules are documented, let's try "wait for some space to appear, unless given SPLICE_F_NONBLOCK, then push into pipe and if we run into overflow, we are done". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice: switch get_iovec_page_array() to iov_iterAl Viro1-99/+36
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice_to_pipe(): don't open-code wakeup_pipe_readers()Al Viro1-4/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03consistent treatment of EFAULT on O_DIRECT read/writeAl Viro1-0/+3
Make local filesystems treat a fault as shortened IO, returning -EFAULT only if nothing had been transferred. That's how everything else (NFS, FUSE, ceph, Lustre) behaves. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-25Linux 4.8-rc8Linus Torvalds1-1/+1
2016-09-25fault_in_multipages_readable() throws set-but-unused errorDave Chinner1-0/+1
When building XFS with -Werror, it now fails with: include/linux/pagemap.h: In function 'fault_in_multipages_readable': include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable] volatile char c; ^ This is a regression caused by commit e23d4159b109 ("fix fault_in_multipages_...() on architectures with no-op access_ok()"). Fix it by re-adding the "(void)c" trick taht was previously used to make the compiler think the variable is used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25mm: check VMA flags to avoid invalid PROT_NONE NUMA balancingLorenzo Stoakes2-8/+7
The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively as requiring balancing upon a subsequent page fault. User-defined PROT_NONE memory regions which also have this flag set will not normally invoke the NUMA balancing code as do_page_fault() will send a segfault to the process before handle_mm_fault() is even called. However if access_remote_vm() is invoked to access a PROT_NONE region of memory, handle_mm_fault() is called via faultin_page() and __get_user_pages() without any access checks being performed, meaning the NUMA balancing logic is incorrectly invoked on a non-NUMA memory region. A simple means of triggering this problem is to access PROT_NONE mmap'd memory using /proc/self/mem which reliably results in the NUMA handling functions being invoked when CONFIG_NUMA_BALANCING is set. This issue was reported in bugzilla (issue 99101) which includes some simple repro code. There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at commit c0e7cad to avoid accidentally provoking strange behaviour by attempting to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are consistently triggered by the repro. This patch moves the PROT_NONE check into mm/memory.c rather than invoking BUG_ON() as faulting in these pages via faultin_page() is a valid reason for reaching the NUMA check with the PROT_NONE page table flag set and is therefore not always a bug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101 Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25radix tree: fix sibling entry handling in radix_tree_descend()Linus Torvalds1-4/+4
The fixes to the radix tree test suite show that the multi-order case is broken. The basic reason is that the radix tree code uses tagged pointers with the "internal" bit in the low bits, and calculating the pointer indices was supposed to mask off those bits. But gcc will notice that we then use the index to re-create the pointer, and will avoid doing the arithmetic and use the tagged pointer directly. This cleans the code up, using the existing is_sibling_entry() helper to validate the sibling pointer range (instead of open-coding it), and using entry_to_node() to mask off the low tag bit from the pointer. And once you do that, you might as well just use the now cleaned-up pointer directly. [ Side note: the multi-order code isn't actually ever used in the kernel right now, and the only reason I didn't just delete all that code is that Kirill Shutemov piped up and said: "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries. It also converts shmem-with-huge-pages and hugetlb to them. I'm okay with converting it to other mechanism, but I need something. (I looked into Konstantin's RFC patchset[2]. It looks okay, but I don't feel myself qualified to review it as I don't know much about radix-tree internals.)" [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ] Reported-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Cedric Blancher <cedric.blancher@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25radix tree test suite: Test radix_tree_replace_slot() for multiorder entriesMatthew Wilcox2-5/+13
When we replace a multiorder entry, check that all indices reflect the new value. Also, compile the test suite with -O2, which shows other problems with the code due to some dodgy pointer operations in the radix tree code. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25fix memory leaks in tracing_buffers_splice_read()Al Viro1-6/+8
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-25tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)1-7/+8
The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-25MIPS: Fix delay slot emulation count in debugfsPaul Burton1-0/+1
Commit 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS macro from do_dsemulret, leading to the ds_emul file in debugfs always returning zero even though we perform delay slot emulations. Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14301/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-25MIPS: SMP: Fix possibility of deadlock when bringing CPUs onlineMatt Redfearn1-4/+3
This patch fixes the possibility of a deadlock when bringing up secondary CPUs. The deadlock occurs because the set_cpu_online() is called before synchronise_count_slave(). This can cause a deadlock if the boot CPU, having scheduled another thread, attempts to send an IPI to the secondary CPU, which it sees has been marked online. The secondary is blocked in synchronise_count_slave() waiting for the boot CPU to enter synchronise_count_master(), but the boot cpu is blocked in smp_call_function_many() waiting for the secondary to respond to it's IPI request. Fix this by marking the CPU online in cpu_callin_map and synchronising counters before declaring the CPU online and calculating the maps for IPIs. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reported-by: Justin Chen <justinpopo6@gmail.com> Tested-by: Justin Chen <justinpopo6@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: stable@vger.kernel.org # v4.1+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14302/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-24mm: delete unnecessary and unsafe init_tlb_ubc()Hugh Dickins1-19/+0
init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically initialized with zeroes in the init_task, and copied from parent to child while it is quiescent in arch_dup_task_struct(); so I went to delete it. But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to check that it was always empty at that point, and found them firing: because memcg reclaim can recurse into global reclaim (when allocating biosets for swapout in my case), and arrive back at the init_tlb_ubc() in shrink_node_memcg(). Resetting tlb_ubc.flush_required at that point is wrong: if the upper level needs a deferred TLB flush, but the lower level turns out not to, we miss a TLB flush. But fortunately, that's the only part of the protocol that does not nest: with the initialization removed, cpumask collects bits from upper and lower levels, and flushes TLB when needed. Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-24huge tmpfs: fix Committed_AS leakHugh Dickins1-1/+2
Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows bigger and bigger: just a cosmetic issue for most users, but disabling for those who run without overcommit (/proc/sys/vm/overcommit_memory 2). shmem_uncharge() was forgetting to unaccount __vm_enough_memory's charge, and shmem_charge() was forgetting it on the filesystem-full error path. Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-24shmem: fix tmpfs to handle the huge= option properlyToshi Kani1-1/+1
shmem_get_unmapped_area() checks SHMEM_SB(sb)->huge incorrectly, which leads to a reversed effect of "huge=" mount option. Fix the check in shmem_get_unmapped_area(). Note, the default value of SHMEM_SB(sb)->huge remains as SHMEM_HUGE_NEVER. User will need to specify "huge=" option to enable huge page mappings. Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-23blk-mq: skip unmapped queues in blk_mq_alloc_request_hctxChristoph Hellwig1-2/+14
This provides the caller a feedback that a given hctx is not mapped and thus no command can be sent on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23MIPS: Fix pre-r6 emulation FPU initialisationPaul Burton1-0/+2
In the mipsr2_decoder() function, used to emulate pre-MIPSr6 instructions that were removed in MIPSr6, the init_fpu() function is called if a removed pre-MIPSr6 floating point instruction is the first floating point instruction used by the task. However, init_fpu() performs varous actions that rely upon not being migrated. For example in the most basic case it sets the coprocessor 0 Status.CU1 bit to enable the FPU & then loads FP register context into the FPU registers. If the task were to migrate during this time, it may end up attempting to load FP register context on a different CPU where it hasn't set the CU1 bit, leading to errors such as: do_cpu invoked from kernel context![#2]: CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G D 4.7.0-00424-g49b0c82 #2 task: 838e4000 ti: 88d38000 task.ti: 88d38000 $ 0 : 00000000 00000001 ffffffff 88d3fef8 $ 4 : 838e4000 88d38004 00000000 00000001 $ 8 : 3400fc01 801f8020 808e9100 24000000 $12 : dbffffff 807b69d8 807b0000 00000000 $16 : 00000000 80786150 00400fc4 809c0398 $20 : 809c0338 0040273c 88d3ff28 808e9d30 $24 : 808e9d30 00400fb4 $28 : 88d38000 88d3fe88 00000000 8011a2ac Hi : 0040273c Lo : 88d3ff28 epc : 80114178 _restore_fp+0x10/0xa0 ra : 8011a2ac mipsr2_decoder+0xd5c/0x1660 Status: 1400fc03 KERNEL EXL IE Cause : 1080002c (ExcCode 0b) PrId : 0001a920 (MIPS I6400) Modules linked in: Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0) Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338 808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000 004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28 808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000 00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20 ... Call Trace: [<80114178>] _restore_fp+0x10/0xa0 [<8011a2ac>] mipsr2_decoder+0xd5c/0x1660 [<8010de18>] do_ri+0x90/0x6b8 [<80105c20>] ret_from_exception+0x0/0x10 Fix this by disabling preemption around the call to init_fpu(), ensuring that it starts & completes on one CPU. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: b0a668fb2038 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6") Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.0+ Patchwork: https://patchwork.linux-mips.org/patch/14305/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-23arm64: kgdb: handle read-only text / modulesAKASHI Takahiro2-14/+24
Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using aarch64_insn_write() instead of probe_kernel_write() as introduced by commit 2f896d586610 ("arm64: use fixmap for text patching") in 4.0. Fixes: 11d91a770f1f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support") Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-09-23arm64: Call numa_store_cpu_info() earlier.David Daney1-8/+6
The wq_numa_init() function makes a private CPU to node map by calling cpu_to_node() early in the boot process, before the non-boot CPUs are brought online. Since the default implementation of cpu_to_node() returns zero for CPUs that have never been brought online, the workqueue system's view is that *all* CPUs are on node zero. When the unbound workqueue for a non-zero node is created, the tsk_cpus_allowed() for the worker threads is the empty set because there are, in the view of the workqueue system, no CPUs on non-zero nodes. The code in try_to_wake_up() using this empty cpumask ends up using the cpumask empty set value of NR_CPUS as an index into the per-CPU area pointer array, and gets garbage as it is one past the end of the array. This results in: [ 0.881970] Unable to handle kernel paging request at virtual address fffffb1008b926a4 [ 1.970095] pgd = fffffc00094b0000 [ 1.973530] [fffffb1008b926a4] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000 [ 1.982610] Internal error: Oops: 96000004 [#1] SMP [ 1.987541] Modules linked in: [ 1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G W 4.8.0-rc6-preempt-vol+ #9 [ 1.999435] Hardware name: Cavium ThunderX CN88XX board (DT) [ 2.005159] task: fffffe0fe89cc300 task.stack: fffffe0fe8b8c000 [ 2.011158] PC is at try_to_wake_up+0x194/0x34c [ 2.015737] LR is at try_to_wake_up+0x150/0x34c [ 2.020318] pc : [<fffffc00080e7468>] lr : [<fffffc00080e7424>] pstate: 600000c5 [ 2.027803] sp : fffffe0fe8b8fb10 [ 2.031149] x29: fffffe0fe8b8fb10 x28: 0000000000000000 [ 2.036522] x27: fffffc0008c63bc8 x26: 0000000000001000 [ 2.041896] x25: fffffc0008c63c80 x24: fffffc0008bfb200 [ 2.047270] x23: 00000000000000c0 x22: 0000000000000004 [ 2.052642] x21: fffffe0fe89d25bc x20: 0000000000001000 [ 2.058014] x19: fffffe0fe89d1d00 x18: 0000000000000000 [ 2.063386] x17: 0000000000000000 x16: 0000000000000000 [ 2.068760] x15: 0000000000000018 x14: 0000000000000000 [ 2.074133] x13: 0000000000000000 x12: 0000000000000000 [ 2.079505] x11: 0000000000000000 x10: 0000000000000000 [ 2.084879] x9 : 0000000000000000 x8 : 0000000000000000 [ 2.090251] x7 : 0000000000000040 x6 : 0000000000000000 [ 2.095621] x5 : ffffffffffffffff x4 : 0000000000000000 [ 2.100991] x3 : 0000000000000000 x2 : 0000000000000000 [ 2.106364] x1 : fffffc0008be4c24 x0 : ffffff0ffffada80 [ 2.111737] [ 2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020) [ 2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000) [ 2.125914] fb00: fffffe0fe8b8fb80 fffffc00080e7648 . . . [ 2.442859] Call trace: [ 2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70) [ 2.451843] f940: fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468 [ 2.459767] f960: fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64 [ 2.467690] f980: fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000 [ 2.475614] f9a0: fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000 [ 2.483540] f9c0: fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4 [ 2.491465] f9e0: ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000 [ 2.499387] fa00: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040 [ 2.507309] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.515233] fa40: 0000000000000000 0000000000000000 0000000000000000 0000000000000018 [ 2.523156] fa60: 0000000000000000 0000000000000000 [ 2.528089] [<fffffc00080e7468>] try_to_wake_up+0x194/0x34c [ 2.533723] [<fffffc00080e7648>] wake_up_process+0x28/0x34 [ 2.539275] [<fffffc00080d3764>] create_worker+0x110/0x19c [ 2.544824] [<fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0 [ 2.550724] [<fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4 [ 2.557066] [<fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c [ 2.563234] [<fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168 [ 2.569398] [<fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4 [ 2.575210] [<fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148 [ 2.581027] [<fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8 [ 2.586929] [<fffffc00080dbd64>] kthread+0xdc/0xf0 [ 2.591776] [<fffffc0008083380>] ret_from_fork+0x10/0x50 [ 2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821) [ 2.603464] ---[ end trace 58c0cd36b88802bc ]--- [ 2.608138] Kernel panic - not syncing: Fatal exception Fix by moving call to numa_store_cpu_info() for all CPUs into smp_prepare_cpus(), which happens before wq_numa_init(). Since smp_store_cpu_info() now contains only a single function call, simplify by removing the function and out-lining its contents. Suggested-by: Robert Richter <rric@kernel.org> Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Cc: <stable@vger.kernel.org> # 4.7.x- Signed-off-by: David Daney <david.daney@cavium.com> Reviewed-by: Robert Richter <rrichter@cavium.com> Tested-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-09-23locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help textVivien Didelot1-1/+1
Fix the indefinitiley -> indefinitely typo in Kconfig.debug. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22nvme-rdma: only clear queue flags after successful connectSagi Grimberg1-1/+1
Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag") Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-22i2c: qup: skip qup_i2c_suspend if the device is already runtime suspendedSudeep Holla1-1/+2
If the i2c device is already runtime suspended, if qup_i2c_suspend is executed during suspend-to-idle or suspend-to-ram it will result in the following splat: WARNING: CPU: 3 PID: 1593 at drivers/clk/clk.c:476 clk_core_unprepare+0x80/0x90 Modules linked in: CPU: 3 PID: 1593 Comm: bash Tainted: G W 4.8.0-rc3 #14 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) PC is at clk_core_unprepare+0x80/0x90 LR is at clk_unprepare+0x28/0x40 pc : [<ffff0000086eecf0>] lr : [<ffff0000086f0c58>] pstate: 60000145 Call trace: clk_core_unprepare+0x80/0x90 qup_i2c_disable_clocks+0x2c/0x68 qup_i2c_suspend+0x10/0x20 platform_pm_suspend+0x24/0x68 ... This patch fixes the issue by executing qup_i2c_pm_suspend_runtime conditionally in qup_i2c_suspend. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2016-09-22perf/core: Limit matching exclusive events to one PMUAlexander Shishkin1-1/+1
An "exclusive" PMU is the one that can only have one event scheduled in at any given time. There may be more than one of such PMUs in a system, though, like Intel PT and BTS. It should be allowed to have one event for either of those inside the same context (there may be other constraints that may prevent this, but those would be hardware-specific). However, the exclusivity code is written so that only one event from any of the "exclusive" PMUs is allowed in a context. Fix this by making the exclusive event filter explicitly match two events' PMUs. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160920154811.3255-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22perf/x86/intel/bts: Make it an exclusive PMUAlexander Shishkin1-1/+2
Just like intel_pt, intel_bts can only handle one event at a time, which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first place. However, at the moment one can have as many intel_bts events within the same context at the same time as one pleases. Only one of them, however, will get scheduled and receive the actual trace data. Fix this by making intel_bts an "exclusive" PMU. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()Peter Zijlstra1-1/+1
We cannot use the "z" constraint twice, since its a single register (r0). Change the one not used by movli.l/movco.l to "r". Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22regmap: fix deadlock on _regmap_raw_write() error pathNikita Yushchenko1-1/+5
Commit 815806e39bf6 ("regmap: drop cache if the bus transfer error") added a call to regcache_drop_region() to error path in _regmap_raw_write(). However that path runs with regmap lock taken, and regcache_drop_region() tries to re-take it, causing a deadlock. Fix that by calling map->cache_ops->drop() directly. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-09-22crypto: rsa-pkcs1pad - Handle leading zero for decryptionHerbert Xu1-17/+24
As the software RSA implementation now produces fixed-length output, we need to eliminate leading zeros in the calling code instead. This patch does just that for pkcs1pad decryption while signature verification was fixed in an earlier patch. Fixes: 9b45b7bba3d2 ("crypto: rsa - Generate fixed-length output") Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-09-22KEYS: Fix skcipher IV clobberingHerbert Xu1-4/+7
The IV must not be modified by the skcipher operation so we need to duplicate it. Fixes: c3917fd9dfbc ("KEYS: Use skcipher") Cc: stable@vger.kernel.org Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-09-22mmc: dw_mmc: fix the spamming log messageJaehoon Chung2-5/+12
When there is no Card which is set to "broken-cd", it's displayed a clock information continuously. Because it's polling for detecting card. This patch is fixed this problem. Fixes: 65257a0deed5 ("mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()") Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-09-22tcp: properly account Fast Open SYN-ACK retransYuchung Cheng3-1/+4
Since the TFO socket is accepted right off SYN-data, the socket owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK retransmission or timeout stats (i.e., tcpi_total_retrans, tcpi_retransmits). Currently those stats are only updated upon handshake completes. This patch fixes it. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22tcp: fix under-accounting retransmit SNMP countersYuchung Cheng1-1/+1
This patch fixes these under-accounting SNMP rtx stats LINUX_MIB_TCPFORWARDRETRANS LINUX_MIB_TCPFASTRETRANS LINUX_MIB_TCPSLOWSTARTRETRANS when retransmitting TSO packets Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22MAINTAINERS: Update b44 maintainer.Michael Chan1-1/+1
Taking over as maintainer since Gary Zambrano is no longer working for Broadcom. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net: get rid of an signed integer overflow in ip_idents_reserve()Eric Dumazet1-2/+8
Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve() [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11 [] signed integer overflow: [] -2117905507 + -695755206 cannot be represented in type 'int' Since we do not have uatomic_add_return() yet, use atomic_cmpxchg() so that the arithmetics can be done using unsigned int. Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22net/mlx4_core: Fix to clean devlink resourcesKamal Heib1-0/+3
This patch cleans devlink resources by calling devlink_port_unregister() to avoid the following issues: - Kernel panic when triggering reset flow. - Memory leak due to unfreed resources in mlx4_init_port_info(). Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21btrfs: ensure that file descriptor used with subvol ioctls is a dirJeff Mahoney1-0/+12
If the subvol/snapshot create/destroy ioctls are passed a regular file with execute permissions set, we'll eventually Oops while trying to do inode->i_op->lookup via lookup_one_len. This patch ensures that the file descriptor refers to a directory. Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules) Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl) Cc: <stable@vger.kernel.org> #v2.6.29+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-09-21Btrfs: handle quota reserve failure properlyJosef Bacik1-6/+3
btrfs/022 was spitting a warning for the case that we exceed the quota. If we fail to make our quota reservation we need to clean up our data space reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-09-21i2c: mux: pca954x: retry updating the mux selection on failurePeter Rosin1-1/+1
The cached value of the last selected channel prevents retries on the next call, even on failure to update the selected channel. Fix that. Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2016-09-21i2c-eg20t: fix race between i2c init and interrupt enableYadi.hu1-7/+11
the eg20t driver call request_irq() function before the pch_base_address, base address of i2c controller's register, is assigned an effective value. there is one possible scenario that an interrupt which isn't inside eg20t arrives immediately after request_irq() is executed when i2c controller shares an interrupt number with others. since the interrupt handler pch_i2c_handler() has already active as shared action, it will be called and read its own register to determine if this interrupt is from itself. At that moment, since base address of i2c registers is not remapped in kernel space yet,so the INT handler will access an illegal address and then a error occurs. Signed-off-by: Yadi.hu <yadi.hu@windriver.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2016-09-21MIPS: vDSO: Fix Malta EVA mapping to vDSO page structsJames Hogan1-4/+4
The page structures associated with the vDSO pages in the kernel image are calculated using virt_to_page(), which uses __pa() under the hood to find the pfn associated with the virtual address. The vDSO data pointers however point to kernel symbols, so __pa_symbol() should really be used instead. Since there is no equivalent to virt_to_page() which uses __pa_symbol(), fix init_vdso_image() to work directly with pfns, calculated with __phys_to_pfn(__pa_symbol(...)). This issue broke the Malta Enhanced Virtual Addressing (EVA) configuration which has a non-default implementation of __pa_symbol(). This is because it uses a physical alias so that the kernel executes from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which uses the same underlying RAM. Since there are no page structures associated with the low physical address region, some arbitrary kernel memory would be interpreted as a page structure for the vDSO pages and badness ensues. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.4.x- Patchwork: https://patchwork.linux-mips.org/patch/14229/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-21net: can: ifi: Configure transmitter delayMarek Vasut1-1/+10
Configure the transmitter delay register at +0x1c to correctly handle the CAN FD bitrate switch (BRS). This moves the SSP (secondary sample point) to a proper offset, so that the TDC mechanism works and won't generate error frames on the CAN link. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-09-21vti6: fix input pathNicolas Dichtel4-10/+16
Since commit 1625f4529957, vti6 is broken, all input packets are dropped (LINUX_MIB_XFRMINNOSTATES is incremented). XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in xfrm6_rcv_spi(). A new function xfrm6_rcv_tnl() that enables to pass a value to xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function is used in several handlers). CC: Alexey Kodanev <alexey.kodanev@oracle.com> Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>