aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-05x86/mm: Validate kernel_physical_mapping_init() PTE populationDan Williams6-12/+43
The usage of __flush_tlb_all() in the kernel_physical_mapping_init() path is not necessary. In general flushing the TLB is not required when updating an entry from the !present state. However, to give confidence in the future removal of TLB flushing in this path, use the new set_pte_safe() family of helpers to assert that the !present assumption is true in this path. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-05generic/pgtable: Introduce set_pte_safe()Dan Williams1-0/+38
Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" introduced a warning to capture cases __flush_tlb_all() is called without pre-emption disabled. It triggers a false positive warning in the memory hotplug path. On investigation it was found that the __flush_tlb_all() calls are not necessary. However, they are only "not necessary" in practice provided the ptes are being initially populated from the !present state. Introduce set_pte_safe() as a sanity check that the pte is being updated in a way that does not require a TLB flush. Forgive the macro, the availability of the various of set_pte() levels is hit and miss across architectures. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c6c9@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-05generic/pgtable: Introduce {p4d,pgd}_same()Dan Williams1-0/+14
In preparation for introducing '_safe' versions of page table entry 'set' helpers, introduce generic versions of p4d_same() and pgd_same(). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-05generic/pgtable: Make {pmd, pud}_same() unconditionally availableDan Williams1-14/+0
In preparation for {pmd,pud}_same() to be used outside of transparent huge page code paths, make them unconditionally available. This enables them to be used in the definition of a new family of set_{pte,pmd,pud,p4d,pgd}_safe() helpers. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/fault: Clean up the page fault oops decoder a bitIngo Molnar1-15/+23
- Make the oops messages a bit less scary (don't mention 'HW errors') - Turn 'PROT USER' (which is visually easily confused with PROT_USER) into individual bit descriptors: "[PROT] [USER]". This also makes "[normal kernel read fault]" more apparent. - De-abbreviate variables to make the code easier to read - Use vertical alignment where appropriate. - Add comment about string size limits and the helper function. - Remove unnecessary line breaks. Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/fault: Decode page fault OOPSes betterAndy Lutomirski1-0/+84
One of Linus' favorite hobbies seems to be looking at OOPSes and decoding the error code in his head. This is not one of my favorite hobbies :) Teach the page fault OOPS hander to decode the error code. If it's a !USER fault from user mode, print an explicit note to that effect and print out the addresses of various tables that might cause such an error. With this patch applied, if I intentionally point the LDT at 0x0 and run the x86 selftests, I get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 HW error: normal kernel read fault This was a system access from user code IDT: 0xfffffe0000000000 (limit=0xfff) GDT: 0xfffffe0000001000 (limit=0x7f) LDTR: 0x50 -- base=0x0 limit=0xfff7 TR: 0x40 -- base=0xfffffe0000003000 limit=0x206f PGD 800000000456e067 P4D 800000000456e067 PUD 4623067 PMD 0 SMP PTI CPU: 0 PID: 153 Comm: ldt_gdt_64 Not tainted 4.19.0+ #1317 Hardware name: ... RIP: 0033:0x401454 Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/11212acb25980cd1b3030875cd9502414fbb214d.1542841400.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/vsyscall/64: Use X86_PF constants in the simulated #PF error codeAndy Lutomirski1-1/+1
Rather than hardcoding 6 with a comment, use the defined constants. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/e023f20352b0d05a8b0205629897917262d2ad68.1542841400.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/oops: Show the correct CS value in show_regs()Andy Lutomirski1-3/+2
show_regs() shows the CS in the CPU register instead of the value in regs. This means that we'll probably print "CS: 0010" almost all the time regardless of what was actually in CS when the kernel malfunctioned. This gives a particularly confusing result if we OOPSed due to an implicit supervisor access from user mode. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/4e36812b6e1e95236a812021d35cbf22746b5af6.1542841400.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/fault: Don't try to recover from an implicit supervisor accessAndy Lutomirski1-0/+10
This avoids a situation in which we attempt to apply various fixups that are not intended to handle implicit supervisor accesses from user mode if we screw up in a way that causes this type of fault. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/9999f151d72ff352265f3274c5ab3a4105090f49.1542841400.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22x86/fault: Remove sw_error_codeAndy Lutomirski1-39/+11
All of the fault handling code now corrently checks user_mode(regs) as needed, and nothing depends on the X86_PF_USER bit being munged. Get rid of the sw_error code and use hw_error_code everywhere. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/078f5b8ae6e8c79ff8ee7345b5c476c45003e5ac.1542841400.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Don't set thread.cr2, etc before OOPSingAndy Lutomirski1-8/+0
The fault handling code sets the cr2, trap_nr, and error_code fields in thread_struct before OOPSing. No one reads those fields during an OOPS, so remove the code to set them. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/d418022aa0fad9cb40467aa7acaf4e95be50ee96.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Make error_code sanitization more robustAndy Lutomirski1-9/+21
The error code in a page fault on a kernel address indicates whether that address is mapped, which should not be revealed in a signal. The normal code path for a page fault on a kernel address sanitizes the bit, but the paths for vsyscall emulation and SIGBUS do not. Both are harmless, but for subtle reasons. SIGBUS is never sent for a kernel address, and vsyscall emulation will never fault on a kernel address per se because it will fail an access_ok() check instead. Make the code more robust by adding a helper that sets the relevant fields and sanitizing the error code in the helper. This also cleans up the code -- we had three copies of roughly the same thing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/b31159bd55bd0c4fa061a20dfd6c429c094bebaa.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Improve the condition for signalling vs OOPSingAndy Lutomirski1-1/+1
__bad_area_nosemaphore() currently checks the X86_PF_USER bit in the error code to decide whether to send a signal or to treat the fault as a kernel error. This can cause somewhat erratic behavior. The straightforward cases where the CPL agrees with the hardware USER bit are all correct, but the other cases are confusing. - A user instruction accessing a kernel address with supervisor privilege (e.g. a descriptor table access failed). The USER bit will be clear, and we OOPS. This is correct, because it indicates a kernel bug, not a user error. - A user instruction accessing a user address with supervisor privilege (e.g. a descriptor table was incorrectly pointing at user memory). __bad_area_nosemaphore() will be passed a modified error code with the user bit set, and we will send a signal. Sending the signal will work (because the regs and the entry frame genuinely come from user mode), but we really ought to OOPS, as this event indicates a severe kernel bug. - A kernel instruction with user privilege (i.e. WRUSS). This should OOPS or get fixed up. The current code would instead try send a signal and malfunction. Change the logic: a signal should be sent if the faulting context is user mode *and* the access has user privilege. Otherwise it's either a kernel mode fault or a failed implicit access, either of which should end up in no_context(). Note to -stable maintainers: don't backport this unless you backport CET. The bug it fixes is unobservable in current kernels unless something is extremely wrong. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/10e509c43893170e262e82027ea399130ae81159.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Fix SMAP #PF handling buglet for implicit supervisor accessesAndy Lutomirski1-3/+6
Currently, if a user program somehow triggers an implicit supervisor access to a user address (e.g. if the kernel somehow sets LDTR to a user address), it will be incorrectly detected as a SMAP violation if AC is clear and SMAP is enabled. This is incorrect -- the error has nothing to do with SMAP. Fix the condition so that only accesses with the hardware USER bit set are diagnosed as SMAP violations. With the logic fixed, an implicit supervisor access to a user address will hit the code lower in the function that is intended to handle it even if SMAP is enabled. That logic is still a bit buggy, and later patches will clean it up. I *think* this code is still correct for WRUSS, and I've added a comment to that effect. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/d1d1b2e66ef31f884dba172084486ea9423ddcdb.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Fold smap_violation() into do_user_addr_fault()Andy Lutomirski1-17/+6
smap_violation() has a single caller, and the contents are a bit nonsensical. I'm going to fix it, but first let's fold it into its caller for ease of comprehension. In this particular case, the user_mode(regs) check is incorrect -- it will cause false positives in the case of a user-initiated kernel-privileged access. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/806c366f6ca861152398ce2c01744d59d9aceb6d.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/cpufeatures, x86/fault: Mark SMAP as disabled when configured outAndy Lutomirski2-5/+8
Add X86_FEATURE_SMAP to the disabled features mask as appropriate and use cpu_feature_enabled() in the fault code. This lets us get rid of a redundant IS_ENABLED(CONFIG_X86_SMAP). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/fe93332eded3d702f0b0b4cf83928d6830739ba3.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20x86/fault: Check user_mode(regs) when avoiding an mmap_sem deadlockAndy Lutomirski1-5/+2
The fault-handling code that takes mmap_sem needs to avoid a deadlock that could occur if the kernel took a bad (OOPS-worthy) page fault on a user address while holding mmap_sem. This can only happen if the faulting instruction was in the kernel (i.e. user_mode(regs)). Rather than checking the sw_error_code (which will have the USER bit set if the fault was a USER-permission access *or* if user_mode(regs)), just check user_mode(regs) directly. The old code would have malfunctioned if the kernel executed a bogus WRUSS instruction while holding mmap_sem. Fortunately, that is extremely unlikely in current kernels, which don't use WRUSS. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/4b89b542e8ceba9bd6abde2f386afed6d99244a9.1542667307.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12x86/mm/fault: Allow stack access below %rspWaiman Long1-12/+0
The current x86 page fault handler allows stack access below the stack pointer if it is no more than 64k+256 bytes. Any access beyond the 64k+ limit will cause a segmentation fault. The gcc -fstack-check option generates code to probe the stack for large stack allocation to see if the stack is accessible. The newer gcc does that while updating the %rsp simultaneously. Older gcc's like gcc4 doesn't do that. As a result, an application compiled with an old gcc and the -fstack-check option may fail to start at all: $ cat test.c int main() { char tmp[1024*128]; printf("### ok\n"); return 0; } $ gcc -fstack-check -g -o test test.c $ ./test Segmentation fault The old binary was working in older kernels where expand_stack() was somehow called before the check. But it is not working in newer kernels. Besides, the 64k+ limit check is kind of crude and will not catch a lot of mistakes that userspace applications may be misbehaving anyway. I think the kernel isn't the right place for this kind of tests. We should leave it to userspace instrumentation tools to perform them. The 64k+ limit check is now removed to just let expand_stack() decide if a segmentation fault should happen, when the RLIMIT_STACK limit is exceeded, for example. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1541535149-31963-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-11Linux 4.20-rc2Linus Torvalds1-1/+1
2018-11-11act_mirred: clear skb->tstamp on redirectEric Dumazet2-10/+2
If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11net: dsa: mv88e6xxx: Fix clearing of stats countersAndrew Lunn1-0/+2
The mv88e6161 would sometime fail to probe with a timeout waiting for the switch to complete an operation. This operation is supposed to clear the statistics counters. However, due to a read/modify/write, without the needed mask, the operation actually carried out was more random, with invalid parameters, resulting in the switch not responding. We need to preserve the histogram mode bits, so apply a mask to keep them. Reported-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 40cff8fca9e3 ("net: dsa: mv88e6xxx: Fix stats histogram mode") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11tipc: fix link re-establish failureJon Maloy1-4/+7
When a link failure is detected locally, the link is reset, the flag link->in_session is set to false, and a RESET_MSG with the 'stopping' bit set is sent to the peer. The purpose of this bit is to inform the peer that this endpoint just is going down, and that the peer should handle the reception of this particular RESET message as a local failure. This forces the peer to accept another RESET or ACTIVATE message from this endpoint before it can re-establish the link. This again is necessary to ensure that link session numbers are properly exchanged before the link comes up again. If a failure is detected locally at the same time at the peer endpoint this will do the same, which is also a correct behavior. However, when receiving such messages, the endpoints will not distinguish between 'stopping' RESETs and ordinary ones when it comes to updating session numbers. Both endpoints will copy the received session number and set their 'in_session' flags to true at the reception, while they are still expecting another RESET from the peer before they can go ahead and re-establish. This is contradictory, since, after applying the validation check referred to below, the 'in_session' flag will cause rejection of all such messages, and the link will never come up again. We now fix this by not only handling received RESET/STOPPING messages as a local failure, but also by omitting to set a new session number and the 'in_session' flag in such cases. Fixes: 7ea817f4e832 ("tipc: check session number before accepting link protocol messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11builddeb: Fix inclusion of dtbs in debian packageRob Herring1-2/+2
Commit 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules") moved the location of 'dtbs_install' target which caused dtbs to not be installed when building debian package with 'bindeb-pkg' target. Update the builddeb script to use the same logic that determines if there's a 'dtbs_install' target which is presence of the arch dts directory. Also, use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better indication of whether we are building dtbs. This commit will also have the side effect of installing dtbs on any arch that has dts files. Previously, it was dependent on whether the arch defined 'dtbs_install'. Fixes: 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules") Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-11-11Revert "scripts/setlocalversion: git: Make -dirty check more robust"Guenter Roeck1-1/+1
This reverts commit 6147b1cf19651c7de297e69108b141fb30aa2349. The reverted patch results in attempted write access to the source repository, even if that repository is mounted read-only. Output from "strace git status -uno --porcelain": getcwd("/tmp/linux-test", 129) = 16 open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = -1 EROFS (Read-only file system) While git appears to be able to handle this situation, a monitored build environment (such as the one used for Chrome OS kernel builds) may detect it and bail out with an access violation error. On top of that, the attempted write access suggests that git _will_ write to the file even if a build output directory is specified. Users may have the reasonable expectation that the source repository remains untouched in that situation. Fixes: 6147b1cf19651 ("scripts/setlocalversion: git: Make -dirty check more robust" Cc: Genki Sky <sky@genki.is> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-11-11kbuild: deb-pkg: fix too low build version numberMasahiro Yamada1-2/+5
Since commit b41d920acff8 ("kbuild: deb-pkg: split generating packaging and build"), the build version of the kernel contained in a deb package is too low by 1. Prior to the bad commit, the kernel was built first, then the number in .version file was read out, and written into the debian control file. Now, the debian control file is created before the kernel is actually compiled, which is causing the version number mismatch. Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require the build system to use the specified version number. Fixes: b41d920acff8 ("kbuild: deb-pkg: split generating packaging and build") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Doug Smythies <dsmythies@telus.net>
2018-11-11kconfig: merge_config: avoid false positive matches from comment linesMasahiro Yamada1-3/+4
The current SED_CONFIG_EXP could match to comment lines in config fragment files, especially when CONFIG_PREFIX_ is empty. For example, Buildroot uses empty prefixing; starting symbols with BR2_ is just convention. Make the sed expression more robust against false positives from comment lines. The new sed expression matches to only valid patterns. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Petr Vorel <petr.vorel@gmail.com> Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
2018-11-10net: sched: cls_flower: validate nested enc_opts_policy to avoid warningJakub Kicinski1-1/+13
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only currently contain further nested attributes, which are parsed by hand, so the policy is never actually used resulting in a W=1 build warning: net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=] enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = { Add the validation anyway to avoid potential bugs when other attributes are added and to make the attribute structure slightly more clear. Validation will also set extact to point to bad attribute on error. Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: mvneta: correct typoAlexandre Belloni1-2/+2
The reserved variable should be named reserved1. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09flow_dissector: do not dissect l4 ports for fragments배석진1-2/+2
Only first fragment has the sport/dport information, not the following ones. If we want consistent hash for all fragments, we need to ignore ports even for first fragment. This bug is visible for IPv6 traffic, if incoming fragments do not have a flow label, since skb_get_hash() will give different results for first fragment and following ones. It is also visible if any routing rule wants dissection and sport or dport. See commit 5e5d6fed3741 ("ipv6: route: dissect flow in input path if fib rules need it") for details. [edumazet] rewrote the changelog completely. Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: qualcomm: rmnet: Fix incorrect assignment of real_devSubash Abhinov Kasiviswanathan1-3/+3
A null dereference was observed when a sysctl was being set from userspace and rmnet was stuck trying to complete some actions in the NETDEV_REGISTER callback. This is because the real_dev is set only after the device registration handler completes. sysctl call stack - <6> Unable to handle kernel NULL pointer dereference at virtual address 00000108 <2> pc : rmnet_vnd_get_iflink+0x1c/0x28 <2> lr : dev_get_iflink+0x2c/0x40 <2> rmnet_vnd_get_iflink+0x1c/0x28 <2> inet6_fill_ifinfo+0x15c/0x234 <2> inet6_ifinfo_notify+0x68/0xd4 <2> ndisc_ifinfo_sysctl_change+0x1b8/0x234 <2> proc_sys_call_handler+0xac/0x100 <2> proc_sys_write+0x3c/0x4c <2> __vfs_write+0x54/0x14c <2> vfs_write+0xcc/0x188 <2> SyS_write+0x60/0xc0 <2> el0_svc_naked+0x34/0x38 device register call stack - <2> notifier_call_chain+0x84/0xbc <2> raw_notifier_call_chain+0x38/0x48 <2> call_netdevice_notifiers_info+0x40/0x70 <2> call_netdevice_notifiers+0x38/0x60 <2> register_netdevice+0x29c/0x3d8 <2> rmnet_vnd_newlink+0x68/0xe8 <2> rmnet_newlink+0xa0/0x160 <2> rtnl_newlink+0x57c/0x6c8 <2> rtnetlink_rcv_msg+0x1dc/0x328 <2> netlink_rcv_skb+0xac/0x118 <2> rtnetlink_rcv+0x24/0x30 <2> netlink_unicast+0x158/0x1f0 <2> netlink_sendmsg+0x32c/0x338 <2> sock_sendmsg+0x44/0x60 <2> SyS_sendto+0x150/0x1ac <2> el0_svc_naked+0x34/0x38 Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: allow rx checksum offload configurationDmitry Bogdanov5-6/+18
RX Checksum offloads could not be configured and ignored netdev features flag for checksumming. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: invalid checksumm offload implementationDmitry Bogdanov2-30/+41
Packets with marked invalid IP/UDP/TCP checksums were considered as good by the driver. The error was in a logic, processing offload bits in RX descriptor. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: fixed enable unicast on 32 macvlanIgor Russkikh1-1/+1
Fixed a condition mistake due to which macvlans unicast item number 32 was not added in the unicast filter. The consequence is that when exactly 32 macvlans are created on NIC, the last created macvlan receives no traffic because its MAC was not registered in HW. Fixes: 94b3b542303f ("net: aquantia: vlan unicast address list correct handling") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: fix potential IOMMU fault after driver unbindDmitry Bogdanov4-0/+35
IOMMU fault may occurr on unbind/bind or if_down/if_up sequence. Although driver disables the rings on down, this is not enough. Due to internal HW design, during subsequent initialization NIC sometimes may reuse RX descriptors cache and write to the host memory from the descriptor cache. That's get catched by IOMMU on host. This patch invalidates the descriptor cache in NIC on interface down to prevent writing to the cached descriptors and to the memory pointed in those descriptors. Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: synchronized flow control between mac/phyIgor Russkikh5-8/+50
Flow control statuses were not synchronized between blocks, that caused packets/link drop on some corner cases, when MAC sent PFC although Phy was not expecting these to come. Driver should readout the negotiated FC from phy and configure RX block accordigly. This is done on each link change event with information from FW. Fixes: 288551de45aa ("net: aquantia: Implement rx/tx flow control ethtools callback") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09clk: qcom: gcc: Fix board clock node nameVinod Koul1-1/+1
Device tree node name are not supposed to have "_" in them so fix the node name use of xo_board to xo-board Fixes: 652f1813c113 ("clk: qcom: gcc: Add global clock controller driver for QCS404") Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-11-09x86/cpu/vmware: Do not trace vmware_sched_clock()Steven Rostedt (VMware)1-1/+1
When running function tracing on a Linux guest running on VMware Workstation, the guest would crash. This is due to tracing of the sched_clock internal call of the VMware vmware_sched_clock(), which causes an infinite recursion within the tracing code (clock calls must not be traced). Make vmware_sched_clock() not traced by ftrace. Fixes: 80e9a4f21fd7c ("x86/vmware: Add paravirt sched clock") Reported-by: GwanYeong Kim <gy741.kim@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Alok Kataria <akataria@vmware.com> CC: GwanYeong Kim <gy741.kim@gmail.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: virtualization@lists.linux-foundation.org CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.home
2018-11-09usb: typec: ucsi: add support for Cypress CCGxAjay Gupta3-0/+319
Latest NVIDIA GPU cards have a Cypress CCGx Type-C controller over I2C interface. This UCSI I2C driver uses I2C bus driver interface for communicating with Type-C controller. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09serial: sh-sci: Fix could not remove dev_attr_rx_fifo_timeoutYoshihiro Shimoda1-2/+2
This patch fixes an issue that the sci_remove() could not remove dev_attr_rx_fifo_timeout because uart_remove_one_port() set the port->port.type to PORT_UNKNOWN. Reported-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com> Fixes: 5d23188a473d ("serial: sh-sci: make RX FIFO parameters tunable via sysfs") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-09i2c: nvidia-gpu: make pm_ops staticWolfram Sang1-1/+1
sparse rightfully says: warning: symbol 'gpu_i2c_driver_pm' was not declared. Should it be static? Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09i2c: add i2c bus driver for NVIDIA GPUAjay Gupta5-0/+403
Latest NVIDIA GPU card has USB Type-C interface. There is a Type-C controller which can be accessed over I2C. This driver adds I2C bus driver to communicate with Type-C controller. I2C client driver will be part of USB Type-C UCSI driver. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [wsa: kept Makefile sorting] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09ext4: missing !bh check in ext4_xattr_inode_write()Vasily Averin1-0/+6
According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write() should not return bh = NULL The only time that bh could be NULL, then, would be in the case of something really going wrong; a programming error elsewhere (perhaps a wild pointer dereference) or I/O error causing on-disk file system corruption (although that would be highly unlikely given that we had *just* allocated the blocks and so the metadata blocks in question probably would still be in the cache). Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13
2018-11-09i2c: qcom-geni: Fix runtime PM mismatch with child devicesStephen Boyd1-7/+8
We need to enable runtime PM on this i2c controller before populating child devices with i2c_add_adapter(). Otherwise, if a child device uses runtime PM and stays runtime PM enabled we'll get the following warning at boot. Enabling runtime PM for inactive device (a98000.i2c) with active children [...] Call trace: pm_runtime_enable+0xd8/0xf8 geni_i2c_probe+0x440/0x460 platform_drv_probe+0x74/0xc8 [...] Let's move the runtime PM enabling and setup to before we add the adapter, so that this device can respond to runtime PM requests from children. Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09MAINTAINERS: Add entry for i2c-omap driverVignesh R1-0/+8
Add separate entry for i2c-omap and add my name as maintainer for this driver. Signed-off-by: Vignesh R <vigneshr@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09i2c: omap: Enable for ARCH_K3Vignesh R1-1/+1
Allow I2C_OMAP to be built for K3 platforms. Signed-off-by: Vignesh R <vigneshr@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09dt-bindings: i2c: omap: Add new compatible for AM654 SoCsVignesh R1-2/+6
AM654 SoCs have same I2C IP as OMAP SoCs. Add new compatible to handle AM654 SoCs. While at that reformat the existing compatible list for older SoCs to list one valid compatible per line. Signed-off-by: Vignesh R <vigneshr@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09xen: remove size limit of privcmd-buf mapping interfaceJuergen Gross1-18/+4
Currently the size of hypercall buffers allocated via /dev/xen/hypercall is limited to a default of 64 memory pages. For live migration of guests this might be too small as the page dirty bitmask needs to be sized according to the size of the guest. This means migrating a 8GB sized guest is already exhausting the default buffer size for the dirty bitmap. There is no sensible way to set a sane limit, so just remove it completely. The device node's usage is limited to root anyway, so there is no additional DOS scenario added by allowing unlimited buffers. While at it make the error path for the -ENOMEM case a little bit cleaner by setting n_pages to the number of successfully allocated pages instead of the target size. Fixes: c51b3c639e01f2 ("xen: add new hypercall buffer mapping device") Cc: <stable@vger.kernel.org> #4.18 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-09xen: fix xen_qlock_wait()Juergen Gross1-6/+8
Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable") introduced a regression for Xen guests running fully virtualized (HVM or PVH mode). The Xen hypervisor wouldn't return from the poll hypercall with interrupts disabled in case of an interrupt (for PV guests it does). So instead of disabling interrupts in xen_qlock_wait() use a nesting counter to avoid calling xen_clear_irq_pending() in case xen_qlock_wait() is nested. Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable") Cc: stable@vger.kernel.org Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-09block: make sure writesame bio is aligned with logical block sizeMing Lei1-1/+1
Obviously the created writesame bio has to be aligned with logical block size, and use bio_allowed_max_sectors() to retrieve this number. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: b49a0871be31a745b2ef ("block: remove split code in blkdev_issue_{discard,write_same}") Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09block: cleanup __blkdev_issue_discard()Ming Lei1-17/+6
Cleanup __blkdev_issue_discard() a bit: - remove local variable of 'end_sect' - remove code block of 'fail' Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>