aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/syscall-counts.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-08-13parisc: Fix and improve kernel stack unwindingHelge Deller10-232/+88
This patchset fixes and improves stack unwinding a lot: 1. Show backward stack traces with up to 30 callsites 2. Add callinfo to ENTRY_CFI() such that every assembler function will get an entry in the unwind table 3. Use constants instead of numbers in call_on_stack() 4. Do not depend on CONFIG_KALLSYMS to generate backtraces. 5. Speed up backtrace generation Make sure you have this patch to GNU as installed: https://sourceware.org/ml/binutils/2018-07/msg00474.html Without this patch, unwind info in the kernel is often wrong for various functions. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Remove unnecessary barriers from spinlock.hJohn David Anglin1-6/+2
Now that mb() is an instruction barrier, it will slow performance if we issue unnecessary barriers. The spinlock defines have a number of unnecessary barriers.  The __ldcw() define is both a hardware and compiler barrier.  The mb() barriers in the routines using __ldcw() serve no purpose. The only barrier needed is the one in arch_spin_unlock().  We need to ensure all accesses are complete prior to releasing the lock. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Remove ordered stores from syscall.SJohn David Anglin1-12/+12
Now that we use a sync prior to releasing the locks in syscall.S, we don't need the PA 2.0 ordered stores used to release some locks.  Using an ordered store, potentially slows the release and subsequent code. There are a number of other ordered stores and loads that serve no purpose.  I have converted these to normal stores. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: prefer _THIS_IP_ and _RET_IP_ statement expressionsNick Desaulniers1-2/+2
As part of the effort to reduce the code duplication between _THIS_IP_ and current_text_addr(), let's consolidate callers of current_text_addr() to use _THIS_IP_. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Add HAVE_REGS_AND_STACK_ACCESS_API featureHelge Deller3-0/+112
Some parts of the HAVE_REGS_AND_STACK_ACCESS_API feature is needed for the rseq syscall. This patch adds the most important parts, and as long as we don't support kprobes, we should be fine. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Drop architecture-specific ENOTSUP defineHelge Deller3-13/+2
parisc is the only Linux architecture which has defined a value for ENOTSUP. All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers. Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives problems with userspace programs which expect both to be the same. One such example is a build error in the libuv package, as can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237. Since we dropped HP-UX support, there is no real benefit in keeping an own value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the kernel sources. glibc needs no patch, it reuses the exported headers. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: use generic dma_noncoherent_opsChristoph Hellwig4-139/+16
Switch to the generic noncoherent direct mapping implementation. Fix sync_single_for_cpu to do skip the cache flush unless the transfer is to the device to match the more tested unmap_single path which should have the same cache coherency implications. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: always use flush_kernel_dcache_range for DMA cache maintainanceChristoph Hellwig1-3/+3
Current the S/G list based DMA ops use flush_kernel_vmap_range which contains a few UP optimizations, while the rest of the DMA operations uses flush_kernel_dcache_range. The single vs sg operations are supposed to have the same effect, so they should use the same routines. Use the more conservation version for now, but if people more familiar with parisc think the vmap version is generally fine for DMA we should switch all interfaces over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: merge pcx_dma_ops and pcxl_dma_opsChristoph Hellwig4-59/+43
The only difference is that pcxl supports dma coherent allocations, while pcx only supports non-consistent allocations and otherwise fails. But dma_alloc* is not in the fast path, and merging these two allows an easy migration path to the generic dma-noncoherent implementation, so do it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-12Linux 4.18Linus Torvalds1-1/+1
2018-08-12init: rename and re-order boot_cpu_state_init()Linus Torvalds3-3/+3
This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-11xen/netfront: don't cache skb_shinfo()Juergen Gross1-4/+4
skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache its return value. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlanIvan Khoronzhuk1-4/+7
It's exclusive with normal behaviour but if try to set vlan to one of the reserved values is made, the cpsw runtime pm is broken. Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: ethernet: ti: cpsw: clear all entries when delete vidIvan Khoronzhuk2-11/+5
In cases if some of the entries were not found in forwarding table while killing vlan, the rest not needed entries still left in the table. No need to stop, as entry was deleted anyway. So fix this by returning error only after all was cleaned. To implement this, return -ENOENT in cpsw_ale_del_mcast() as it's supposed to be. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10zram: remove BD_CAP_SYNCHRONOUS_IO with writeback featureMinchan Kim1-1/+14
If zram supports writeback feature, it's no longer a BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations for incompressible pages. Do not pretend to be synchronous IO device. It makes the system very sluggish due to waiting for IO completion from upper layers. Furthermore, it causes a user-after-free problem because swap thinks the opearion is done when the IO functions returns so it can free the page (e.g., lock_page_or_retry and goto out_release in do_swap_page) but in fact, IO is asynchronous so the driver could access a just freed page afterward. This patch fixes the problem. BUG: Bad page state in process qemu-system-x86 pfn:3dfab21 page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1 flags: 0x17fffc000000008(uptodate) raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x8(uptodate) CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 Call Trace: dump_stack+0x5c/0x7b bad_page+0xba/0x120 get_page_from_freelist+0x1016/0x1250 __alloc_pages_nodemask+0xfa/0x250 alloc_pages_vma+0x7c/0x1c0 do_swap_page+0x347/0x920 __handle_mm_fault+0x7b4/0x1110 handle_mm_fault+0xfc/0x1f0 __get_user_pages+0x12f/0x690 get_user_pages_unlocked+0x148/0x1f0 __gfn_to_pfn_memslot+0xff/0x3c0 [kvm] try_async_pf+0x87/0x230 [kvm] tdp_page_fault+0x132/0x290 [kvm] kvm_mmu_page_fault+0x74/0x570 [kvm] kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm] kvm_vcpu_ioctl+0x388/0x5d0 [kvm] do_vfs_ioctl+0xa2/0x630 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/ Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org [minchan@kernel.org: fix changelog, add comment] Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/ Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Tino Lehnig <tino.lehnig@contabo.de> Tested-by: Tino Lehnig <tino.lehnig@contabo.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [4.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10mm/memory.c: check return value of ioremap_protjie@chenjie6@huwei.com1-0/+3
ioremap_prot() can return NULL which could lead to an oops. Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com Signed-off-by: chen jie <chenjie6@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Cc: chenjie <chenjie6@huawei.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10lib/ubsan: remove null-pointer checksAndrey Ryabinin4-17/+0
With gcc-8 fsanitize=null become very noisy. GCC started to complain about things like &a->b, where 'a' is NULL pointer. There is no NULL dereference, we just calculate address to struct member. It's technically undefined behavior so UBSAN is correct to report it. But as long as there is no real NULL-dereference, I think, we should be fine. -fno-delete-null-pointer-checks compiler flag should protect us from any consequences. So let's just no use -fsanitize=null as it's not useful for us. If there is a real NULL-deref we will see crash. Even if userspace mapped something at NULL (root can do this), with things like SMAP should catch the issue. Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10MAINTAINERS: GDB: update e-mail addressKieran Bingham1-1/+1
This entry was created with my personal e-mail address. Update this entry to my open-source kernel.org account. Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.com Signed-off-by: Kieran Bingham <kbingham@kernel.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10smb3: create smb3 equivalent alias for cifs pseudo-xattrsSteve French1-2/+26
We really, really don't want to be encouraging people to use cifs (the dialect) since it is insecure, so to avoid confusion we want to move them to names which include 'smb3' instead of 'cifs' - so this simply creates an alias for the pseudo-xattrs e.g. can now do: getfattr -n user.smb3.creationtime /mnt1/file and getfattr -n user.smb3.dosattrib /mnt1/file and getfattr -n system.smb3_acl /mnt1/file instead of forcing you to use the string 'cifs' in these (e.g. getfattr -n system.cifs_acl /mnt1/file) Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-10x86/mm/pti: Move user W+X check into pti_finalize()Joerg Roedel3-5/+10
The user page-table gets the updated kernel mappings in pti_finalize(), which runs after the RO+X permissions got applied to the kernel page-table in mark_readonly(). But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in mark_readonly() for insecure mappings. This causes false-positive warnings, because the user page-table did not get the updated mappings yet. Move the W+X check for the user page-table into pti_finalize() after it updated all required mappings. [ tglx: Folded !NX supported fix ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
2018-08-10smb3: allow previous versions to be mounted with snapshot= mount parmSteve French2-0/+68
mounting with the "snapshots=" mount parm allows a read-only view of a previous version of a file system (see MS-SMB2 and "timewarp" tokens, section 2.2.13.2.6) based on the timestamp passed in on the snapshots mount parm. Add processing to optionally send this create context. Example output: /mnt1 is mounted with "snapshots=..." and will see an earlier version of the directory, with three fewer files than /mnt2 the current version of the directory. root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# cat /proc/mounts | grep cifs //172.22.149.186/public /mnt1 cifs ro,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=131748608570000000,actimeo=1 //172.22.149.186/public /mnt2 cifs rw,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1 root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1 EmptyDir newerdir root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1/newerdir root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2 EmptyDir file newerdir newestdir timestamp-trace.cap root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2/newerdir new-file-not-in-snapshot Snapshots are extremely useful for comparing previous versions of files or directories, and recovering from data corruptions or mistakes. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-10cifs: don't show domain= in mount output when domain is emptyRonnie Sahlberg1-1/+1
Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-10cifs: add missing support for ACLs in SMB 3.11Ronnie Sahlberg1-0/+5
We were missing the methods for get_acl and friends for the 3.11 dialect. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-10hwmon: (adt7475) Change show functions to return error data correctlyTokunori Ikegami1-2/+38
Change update device function to return an error pointer if needed, and report the error to user space. Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Chris Packham <chris.packham@alliedtelesis.co.nz> [groeck: Clarified/updated description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-10hwmon: (adt7475) Change update functions to add error handlingTokunori Ikegami1-42/+145
I2C SMBus sometimes returns error codes. In the error case, measurement values are updated incorrectly. The sensor application then generates warning log messages and SNMP traps. To prevent this, add error handling into the update functions. Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Chris Packham <chris.packham@alliedtelesis.co.nz> [groeck: Update description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-10hwmon: (adt7475) Change valid parameter to bool typeTokunori Ikegami1-1/+2
Currently the valid variable is of type char, but it is used as boolean. So let's change it to bool. Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Chris Packham <chris.packham@alliedtelesis.co.nz> [groeck: Update description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-10hwmon: (adt7475) Split device update function to measure and limitsTokunori Ikegami1-101/+109
The update function reads both measurement and limit values. Those parts can be split so split them for a maintainability. Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Chris Packham <chris.packham@alliedtelesis.co.nz> [groeck: Clarify description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-09smb3: enumerating snapshots was leaving part of the data off endSteve French1-7/+27
When enumerating snapshots, the last few bytes of the final snapshot could be left off since we were miscalculating the length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY) See MS-SMB2 section 2.2.32.2. In addition fixup the length used to allow smaller buffer to be passed in, in order to allow returning the size of the whole snapshot array more easily. Sample userspace output with a kernel patched with this (mounted to a Windows volume with two snapshots). Before this patch, the second snapshot would be missing a few bytes at the end. ~/cifs-2.6# ~/enum-snapshots /mnt/file press enter to issue the ioctl to retrieve snapshot information ... size of snapshot array = 102 Num snapshots: 2 Num returned: 2 Array Size: 102 Snapshot 0:@GMT-2018.06.30-19.34.17 Snapshot 1:@GMT-2018.06.30-19.33.37 CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09cifs: update smb2_queryfs() to use compoundingRonnie Sahlberg4-26/+131
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09cifs: update receive_encrypted_standard to handle compounded responsesRonnie Sahlberg4-43/+107
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09make sure that __dentry_kill() always invalidates d_seq, unhashed or notAl Viro1-5/+2
RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09fix __legitimize_mnt()/mntput() raceAl Viro1-0/+14
__legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by: Oleg Nesterov <oleg@redhat.com> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09MIPS: Remove remnants of UASM_ISAPaul Burton2-2/+0
Commit 33679a50370d ("MIPS: uasm: Remove needless ISA abstraction") removed use of the MIPS_ISA preprocessor macro, but left a couple of unused definitions of it behind. Remove the dead code. Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-09fix mntput/mntput raceAl Viro1-2/+12
mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn <jannh@google.com> Tested-by: Jann Horn <jannh@google.com> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09hwmon: k10temp: Support Threadripper 2920X, 2970WX; simplify offset tableGuenter Roeck1-8/+2
All announced Threadripper 29xx models have a temperature offset of 27 degrees C. Simplify temperature offset table to match all 29xx Threadripper models with a single entry. Also simplify the table to match all 19xx Threadripper models with a single entry. This effectively drops entries for Threadripper 1910/1920/1950 which never saw the light of day. Cc: Michael Larabel <Michael@phoronix.com> Cc: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-09xdp: fix bug in devmap teardown code pathJesper Dangaard Brouer1-5/+9
Like cpumap teardown, the devmap teardown code also flush remaining xdp_frames, via bq_xmit_all() in case map entry is removed. The code can call xdp_return_frame_rx_napi, from the the wrong context, in-case ndo_xdp_xmit() fails. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easierJesper Dangaard Brouer2-3/+3
The teardown race in cpumap is really hard to reproduce. These changes makes it easier to reproduce, for QA. The --stress-mode now have a case of a very small queue size of 8, that helps to trigger teardown flush to encounter a full queue, which results in calling xdp_return_frame API, in a non-NAPI protect context. Also increase MAX_CPUS, as my QA department have larger machines than me. Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09xdp: fix bug in cpumap teardown code pathJesper Dangaard Brouer1-6/+9
When removing a cpumap entry, a number of syncronization steps happen. Eventually the teardown code __cpu_map_entry_free is invoked from/via call_rcu. The teardown code __cpu_map_entry_free() flushes remaining xdp_frames, by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi(). The issues is that the teardown code is not running in the RX NAPI code path. Thus, it is not allowed to invoke the NAPI variant of xdp_return_frame. This bug was found and triggered by using the --stress-mode option to the samples/bpf program xdp_redirect_cpu. It is hard to trigger, because the ptr_ring have to be full and cpumap bulk queue max contains 8 packets, and a remote CPU is racing to empty the ptr_ring queue. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09x86/relocs: Add __end_rodata_aligned to S_RELJoerg Roedel1-0/+1
This new symbol needs to be in the workaround-list for buggy binutils, otherwise the build with gcc-4.6 fails. Fixes: 39d668e04eda ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit') Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linux-Next Mailing List <linux-next@vger.kernel.org> Link: https://lkml.kernel.org/r/20180809094449.ddmnrkz7qkvo3j2x@suse.de
2018-08-09i2c: xlp9xx: Fix case where SSIF read transaction completes earlyGeorge Cherian1-13/+28
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-08-09s390/dasd: fix hanging offline processing due to canceled workerStefan Haberland1-2/+5
During offline processing two worker threads are canceled without freeing the device reference which leads to a hanging offline process. Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09s390/dasd: fix panic for failed online processingStefan Haberland1-0/+3
Fix a panic that occurs for a device that got an error in dasd_eckd_check_characteristics() during online processing. For example the read configuration data command may have failed. If this error occurs the device is not being set online and the earlier invoked steps during online processing are rolled back. Therefore dasd_eckd_uncheck_device() is called which needs a valid private structure. But this pointer is not valid if dasd_eckd_check_characteristics() has failed. Check for a valid device->private pointer to prevent a panic. Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09s390/mm: fix addressing exception after suspend/resumeGerald Schaefer1-1/+1
Commit c9b5ad546e7d "s390/mm: tag normal pages vs pages used in page tables" accidentally changed the logic in arch_set_page_states(), which is used by the suspend/resume code. set_page_stable(page, order) was changed to set_page_stable_dat(page, 0). After this, only the first page of higher order pages will be set to stable, and a write to one of the unstable pages will result in an addressing exception. Fix this by using "order" again, instead of "0". Fixes: c9b5ad546e7d ("s390/mm: tag normal pages vs pages used in page tables") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09rseq/selftests: add s390 supportVasily Gorbik3-0/+539
Implement support for s390 in the rseq selftests, in order to sanity check the recently enabled rseq syscall. The Implementation covers both 64-bit and 31-bit mode. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-08dsa: slave: eee: Allow ports to use phylinkAndrew Lunn1-2/+2
For a port to be able to use EEE, both the MAC and the PHY must support EEE. A phy can be provided by both a phydev or phylink. Verify at least one of these exist, not just phydev. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: move sock lock in smc_ioctl()Ursula Braun1-3/+7
When an SMC socket is connecting it is decided whether fallback to TCP is needed. To avoid races between connect and ioctl move the sock lock before the use_fallback check. Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: allow sysctl rmem and wmem defaults for serversUrsula Braun1-0/+2
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size optimizations for servers should be ignored. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: no shutdown in state SMC_LISTENUrsula Braun1-2/+1
Invoking shutdown for a socket in state SMC_LISTEN does not make sense. Nevertheless programs like syzbot fuzzing the kernel may try to do this. For SMC this means a socket refcounting problem. This patch makes sure a shutdown call for an SMC socket in state SMC_LISTEN simply returns with -ENOTCONN. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net: aquantia: Fix IFF_ALLMULTI flag functionalityDmitry Bogdanov1-1/+1
It was noticed that NIC always pass all multicast traffic to the host regardless of IFF_ALLMULTI flag on the interface. The rule in MC Filter Table in NIC, that is configured to accept any multicast packets, is turning on if IFF_MULTICAST flag is set on the interface. It leads to passing all multicast traffic to the host. This fix changes the condition to turn on that rule by checking IFF_ALLMULTI flag as it should. Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08rxrpc: Fix the keepalive generator [ver #2]David Howells7-89/+109
AF_RXRPC has a keepalive message generator that generates a message for a peer ~20s after the last transmission to that peer to keep firewall ports open. The implementation is incorrect in the following ways: (1) It mixes up ktime_t and time64_t types. (2) It uses ktime_get_real(), the output of which may jump forward or backward due to adjustments to the time of day. (3) If the current time jumps forward too much or jumps backwards, the generator function will crank the base of the time ring round one slot at a time (ie. a 1s period) until it catches up, spewing out VERSION packets as it goes. Fix the problem by: (1) Only using time64_t. There's no need for sub-second resolution. (2) Use ktime_get_seconds() rather than ktime_get_real() so that time isn't perceived to go backwards. (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two parts: (a) The "worker" function that manages the buckets and the timer. (b) The "dispatch" function that takes the pending peers and potentially transmits a keepalive packet before putting them back in the ring into the slot appropriate to the revised last-Tx time. (4) Taking everything that's pending out of the ring and splicing it into a temporary collector list for processing. In the case that there's been a significant jump forward, the ring gets entirely emptied and then the time base can be warped forward before the peers are processed. The warping can't happen if the ring isn't empty because the slot a peer is in is keepalive-time dependent, relative to the base time. (5) Limit the number of iterations of the bucket array when scanning it. (6) Set the timer to skip any empty slots as there's no point waking up if there's nothing to do yet. This can be triggered by an incoming call from a server after a reboot with AF_RXRPC and AFS built into the kernel causing a peer record to be set up before userspace is started. The system clock is then adjusted by userspace, thereby potentially causing the keepalive generator to have a meltdown - which leads to a message like: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23] ... Workqueue: krxrpcd rxrpc_peer_keepalive_worker EIP: lock_acquire+0x69/0x80 ... Call Trace: ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? _raw_spin_lock_bh+0x29/0x60 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? __lock_acquire+0x3d3/0x870 ? process_one_work+0x110/0x340 ? process_one_work+0x166/0x340 ? process_one_work+0x110/0x340 ? worker_thread+0x39/0x3c0 ? kthread+0xdb/0x110 ? cancel_delayed_work+0x90/0x90 ? kthread_stop+0x70/0x70 ? ret_from_fork+0x19/0x24 Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>