aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-08-21parisc: Fix boot failure of 64-bit kernelHelge Deller3-27/+15
Commit c8921d72e390 ("parisc: Fix and improve kernel stack unwinding") broke booting of 64-bit kernels. On 64-bit kernels function pointers are actually function descriptors which require dereferencing. In this patch we instead declare functions in assembly code which are referenced from C-code as external data pointers with the ENTRY() macro and thus can use a simple external reference to the functions. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: c8921d72e390 ("parisc: Fix and improve kernel stack unwinding")
2018-08-17parisc: Consolidate unwind initialization callsHelge Deller4-57/+36
Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-17parisc: Update comments in syscall.S regarding wide userlandHelge Deller1-10/+3
We do support running 64-bit userspace processes, although there isn't yet full gcc and glibc support. Anyway, fix the comments to reflect the reality. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-17parisc: Fix ptraced 64-bit applications to call 64-bit syscallsHelge Deller1-4/+18
Fix the strace code path to call 64-bit syscalls in case we are executing by a 64-bit application. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-17parisc: Restore possibility to execute 64-bit applicationsHelge Deller6-39/+39
Executing 64-bit applications was broken. This patch restores this support and cleans up some code paths. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Fix and improve kernel stack unwindingHelge Deller10-232/+88
This patchset fixes and improves stack unwinding a lot: 1. Show backward stack traces with up to 30 callsites 2. Add callinfo to ENTRY_CFI() such that every assembler function will get an entry in the unwind table 3. Use constants instead of numbers in call_on_stack() 4. Do not depend on CONFIG_KALLSYMS to generate backtraces. 5. Speed up backtrace generation Make sure you have this patch to GNU as installed: https://sourceware.org/ml/binutils/2018-07/msg00474.html Without this patch, unwind info in the kernel is often wrong for various functions. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Remove unnecessary barriers from spinlock.hJohn David Anglin1-6/+2
Now that mb() is an instruction barrier, it will slow performance if we issue unnecessary barriers. The spinlock defines have a number of unnecessary barriers.  The __ldcw() define is both a hardware and compiler barrier.  The mb() barriers in the routines using __ldcw() serve no purpose. The only barrier needed is the one in arch_spin_unlock().  We need to ensure all accesses are complete prior to releasing the lock. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Remove ordered stores from syscall.SJohn David Anglin1-12/+12
Now that we use a sync prior to releasing the locks in syscall.S, we don't need the PA 2.0 ordered stores used to release some locks.  Using an ordered store, potentially slows the release and subsequent code. There are a number of other ordered stores and loads that serve no purpose.  I have converted these to normal stores. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: prefer _THIS_IP_ and _RET_IP_ statement expressionsNick Desaulniers1-2/+2
As part of the effort to reduce the code duplication between _THIS_IP_ and current_text_addr(), let's consolidate callers of current_text_addr() to use _THIS_IP_. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Add HAVE_REGS_AND_STACK_ACCESS_API featureHelge Deller3-0/+112
Some parts of the HAVE_REGS_AND_STACK_ACCESS_API feature is needed for the rseq syscall. This patch adds the most important parts, and as long as we don't support kprobes, we should be fine. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Drop architecture-specific ENOTSUP defineHelge Deller3-13/+2
parisc is the only Linux architecture which has defined a value for ENOTSUP. All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers. Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives problems with userspace programs which expect both to be the same. One such example is a build error in the libuv package, as can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237. Since we dropped HP-UX support, there is no real benefit in keeping an own value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the kernel sources. glibc needs no patch, it reuses the exported headers. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: use generic dma_noncoherent_opsChristoph Hellwig4-139/+16
Switch to the generic noncoherent direct mapping implementation. Fix sync_single_for_cpu to do skip the cache flush unless the transfer is to the device to match the more tested unmap_single path which should have the same cache coherency implications. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: always use flush_kernel_dcache_range for DMA cache maintainanceChristoph Hellwig1-3/+3
Current the S/G list based DMA ops use flush_kernel_vmap_range which contains a few UP optimizations, while the rest of the DMA operations uses flush_kernel_dcache_range. The single vs sg operations are supposed to have the same effect, so they should use the same routines. Use the more conservation version for now, but if people more familiar with parisc think the vmap version is generally fine for DMA we should switch all interfaces over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: merge pcx_dma_ops and pcxl_dma_opsChristoph Hellwig4-59/+43
The only difference is that pcxl supports dma coherent allocations, while pcx only supports non-consistent allocations and otherwise fails. But dma_alloc* is not in the fast path, and merging these two allows an easy migration path to the generic dma-noncoherent implementation, so do it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-12Linux 4.18Linus Torvalds1-1/+1
2018-08-12Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds7-49/+69
Pull SCSI fixes from James Bottomley: "Eight fixes. The most important one is the mpt3sas fix which makes the driver work again on big endian systems. The rest are mostly minor error path or checker issues and the vmw_scsi one fixes a performance problem" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled scsi: mpt3sas: Swap I/O memory read value back to cpu endianness scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO scsi: fcoe: drop frames in ELS LOGO error path scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send scsi: qedi: Fix a potential buffer overflow scsi: qla2xxx: Fix memory leak for allocating abort IOCB
2018-08-12init: rename and re-order boot_cpu_state_init()Linus Torvalds3-3/+3
This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-12Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-9/+32
Pull vfs fixes from Al Viro: "A bunch of race fixes, mostly around lazy pathwalk. All of it is -stable fodder, a large part going back to 2013" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make sure that __dentry_kill() always invalidates d_seq, unhashed or not fix __legitimize_mnt()/mntput() race fix mntput/mntput race root dentries need RCU-delayed freeing
2018-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds11-39/+46
Pull networking fixes from David Miller: "Last bit of straggler fixes... 1) Fix btf library licensing to LGPL, from Martin KaFai lau. 2) Fix error handling in bpf sockmap code, from Daniel Borkmann. 3) XDP cpumap teardown handling wrt. execution contexts, from Jesper Dangaard Brouer. 4) Fix loss of runtime PM on failed vlan add/del, from Ivan Khoronzhuk. 5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail() call, which potentially changes the skb's data buffer, and thus skb_shinfo(). Fix from Juergen Gross" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: xen/netfront: don't cache skb_shinfo() net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan net: ethernet: ti: cpsw: clear all entries when delete vid xdp: fix bug in devmap teardown code path samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier xdp: fix bug in cpumap teardown code path bpf, sockmap: fix cork timeout for select due to epipe bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path bpf, sockmap: fix bpf_tcp_sendmsg sock error handling bpf: btf: Change tools/lib/bpf/btf to LGPL
2018-08-11xen/netfront: don't cache skb_shinfo()Juergen Gross1-4/+4
skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache its return value. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'cpsw-runtime-pm-fix'David S. Miller2-15/+12
Grygorii Strashko says: ==================== net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid Here 2 not critical fixes for: - vlan ale table leak while error if deleting vlan (simplifies next fix) - runtime pm while try to set reserved vlan ==================== Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlanIvan Khoronzhuk1-4/+7
It's exclusive with normal behaviour but if try to set vlan to one of the reserved values is made, the cpsw runtime pm is broken. Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: ethernet: ti: cpsw: clear all entries when delete vidIvan Khoronzhuk2-11/+5
In cases if some of the entries were not found in forwarding table while killing vlan, the rest not needed entries still left in the table. No need to stop, as entry was deleted anyway. So fix this by returning error only after all was cleaned. To implement this, return -ENOENT in cpsw_ale_del_mcast() as it's supposed to be. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10zram: remove BD_CAP_SYNCHRONOUS_IO with writeback featureMinchan Kim1-1/+14
If zram supports writeback feature, it's no longer a BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations for incompressible pages. Do not pretend to be synchronous IO device. It makes the system very sluggish due to waiting for IO completion from upper layers. Furthermore, it causes a user-after-free problem because swap thinks the opearion is done when the IO functions returns so it can free the page (e.g., lock_page_or_retry and goto out_release in do_swap_page) but in fact, IO is asynchronous so the driver could access a just freed page afterward. This patch fixes the problem. BUG: Bad page state in process qemu-system-x86 pfn:3dfab21 page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1 flags: 0x17fffc000000008(uptodate) raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x8(uptodate) CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 Call Trace: dump_stack+0x5c/0x7b bad_page+0xba/0x120 get_page_from_freelist+0x1016/0x1250 __alloc_pages_nodemask+0xfa/0x250 alloc_pages_vma+0x7c/0x1c0 do_swap_page+0x347/0x920 __handle_mm_fault+0x7b4/0x1110 handle_mm_fault+0xfc/0x1f0 __get_user_pages+0x12f/0x690 get_user_pages_unlocked+0x148/0x1f0 __gfn_to_pfn_memslot+0xff/0x3c0 [kvm] try_async_pf+0x87/0x230 [kvm] tdp_page_fault+0x132/0x290 [kvm] kvm_mmu_page_fault+0x74/0x570 [kvm] kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm] kvm_vcpu_ioctl+0x388/0x5d0 [kvm] do_vfs_ioctl+0xa2/0x630 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/ Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org [minchan@kernel.org: fix changelog, add comment] Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/ Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Tino Lehnig <tino.lehnig@contabo.de> Tested-by: Tino Lehnig <tino.lehnig@contabo.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [4.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10mm/memory.c: check return value of ioremap_protjie@chenjie6@huwei.com1-0/+3
ioremap_prot() can return NULL which could lead to an oops. Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com Signed-off-by: chen jie <chenjie6@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Cc: chenjie <chenjie6@huawei.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10lib/ubsan: remove null-pointer checksAndrey Ryabinin4-17/+0
With gcc-8 fsanitize=null become very noisy. GCC started to complain about things like &a->b, where 'a' is NULL pointer. There is no NULL dereference, we just calculate address to struct member. It's technically undefined behavior so UBSAN is correct to report it. But as long as there is no real NULL-dereference, I think, we should be fine. -fno-delete-null-pointer-checks compiler flag should protect us from any consequences. So let's just no use -fsanitize=null as it's not useful for us. If there is a real NULL-deref we will see crash. Even if userspace mapped something at NULL (root can do this), with things like SMAP should catch the issue. Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10MAINTAINERS: GDB: update e-mail addressKieran Bingham1-1/+1
This entry was created with my personal e-mail address. Update this entry to my open-source kernel.org account. Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.com Signed-off-by: Kieran Bingham <kbingham@kernel.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-10Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds1-13/+28
Pull i2c fix from Wolfram Sang: "A single driver bugfix for I2C. The bug was found by systematically stress testing the driver, so I am confident to merge it that late in the cycle although it is probably unusually large" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xlp9xx: Fix case where SSIF read transaction completes early
2018-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller8-20/+30
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-10 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix cpumap and devmap on teardown as they're under RCU context and won't have same assumption as running under NAPI protection, from Jesper. 2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had a bug where socket error was not propagated correctly, from Daniel. 3) Fix incompatible libbpf header license for BTF code and match it before it gets officially released with the rest of libbpf which is LGPL-2.1, from Martin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09make sure that __dentry_kill() always invalidates d_seq, unhashed or notAl Viro1-5/+2
RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09fix __legitimize_mnt()/mntput() raceAl Viro1-0/+14
__legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by: Oleg Nesterov <oleg@redhat.com> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09fix mntput/mntput raceAl Viro1-2/+12
mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn <jannh@google.com> Tested-by: Jann Horn <jannh@google.com> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09Merge branch 'bpf-fix-cpu-and-devmap-teardown'Daniel Borkmann4-14/+21
Jesper Dangaard Brouer says: ==================== Removing entries from cpumap and devmap, goes through a number of syncronization steps to make sure no new xdp_frames can be enqueued. But there is a small chance, that xdp_frames remains which have not been flushed/processed yet. Flushing these during teardown, happens from RCU context and not as usual under RX NAPI context. The optimization introduced in commt 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi"), missed that the flush operation can also be called from RCU context. Thus, we cannot always use the xdp_return_frame_rx_napi call, which take advantage of the protection provided by XDP RX running under NAPI protection. The samples/bpf xdp_redirect_cpu have a --stress-mode, that is adjusted to easier reproduce (verified by Red Hat QA). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09xdp: fix bug in devmap teardown code pathJesper Dangaard Brouer1-5/+9
Like cpumap teardown, the devmap teardown code also flush remaining xdp_frames, via bq_xmit_all() in case map entry is removed. The code can call xdp_return_frame_rx_napi, from the the wrong context, in-case ndo_xdp_xmit() fails. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easierJesper Dangaard Brouer2-3/+3
The teardown race in cpumap is really hard to reproduce. These changes makes it easier to reproduce, for QA. The --stress-mode now have a case of a very small queue size of 8, that helps to trigger teardown flush to encounter a full queue, which results in calling xdp_return_frame API, in a non-NAPI protect context. Also increase MAX_CPUS, as my QA department have larger machines than me. Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09xdp: fix bug in cpumap teardown code pathJesper Dangaard Brouer1-6/+9
When removing a cpumap entry, a number of syncronization steps happen. Eventually the teardown code __cpu_map_entry_free is invoked from/via call_rcu. The teardown code __cpu_map_entry_free() flushes remaining xdp_frames, by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi(). The issues is that the teardown code is not running in the RX NAPI code path. Thus, it is not allowed to invoke the NAPI variant of xdp_return_frame. This bug was found and triggered by using the --stress-mode option to the samples/bpf program xdp_redirect_cpu. It is hard to trigger, because the ptr_ring have to be full and cpumap bulk queue max contains 8 packets, and a remote CPU is racing to empty the ptr_ring queue. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds8-191/+101
Pull crypto fix from Herbert Xu: "This fixes a performance regression in arm64 NEON crypto as well as a crash in x86 aegis/morus on unsupported CPUs" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Fix and simplify CPUID checks crypto: arm64 - revert NEON yield for fast AEAD implementations
2018-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds26-161/+178
Pull networking fixes from David Miller: 1) The real fix for the ipv6 route metric leak Sabrina was seeing, from Cong Wang. 2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room conditions, from Willem de Bruijn. 3) vsock can reinitialize active work struct, fix from Cong Wang. 4) RXRPC keepalive generator can wedge a cpu, fix from David Howells. 5) Fix locking in AF_SMC ioctl, from Ursula Braun. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: dsa: slave: eee: Allow ports to use phylink net/smc: move sock lock in smc_ioctl() net/smc: allow sysctl rmem and wmem defaults for servers net/smc: no shutdown in state SMC_LISTEN net: aquantia: Fix IFF_ALLMULTI flag functionality rxrpc: Fix the keepalive generator [ver #2] net/mlx5e: Cleanup of dcbnl related fields net/mlx5e: Properly check if hairpin is possible between two functions vhost: reset metadata cache when initializing new IOTLB llc: use refcount_inc_not_zero() for llc_sap_find() dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() tipc: fix an interrupt unsafe locking scenario vsock: split dwork to avoid reinitializations net: thunderx: check for failed allocation lmac->dmacs cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts packet: refine ring v3 block size test to hold one frame ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit ipv6: fix double refcount of fib6_metrics
2018-08-09i2c: xlp9xx: Fix case where SSIF read transaction completes earlyGeorge Cherian1-13/+28
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-08-08dsa: slave: eee: Allow ports to use phylinkAndrew Lunn1-2/+2
For a port to be able to use EEE, both the MAC and the PHY must support EEE. A phy can be provided by both a phydev or phylink. Verify at least one of these exist, not just phydev. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08Merge branch 'smc-fixes'David S. Miller1-5/+10
Ursula Braun says: ==================== net/smc: fixes 2018-08-08 here are small fixes for SMC: The first patch makes sure, shutdown code is not executed for sockets in state SMC_LISTEN. The second patch resets send and receive buffer values for accepted sockets, since TCP buffer size optimizations for the internal CLC socket should not be forwarded to the outer SMC socket. The third patch solves a race between connect and ioctl reported by syzbot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: move sock lock in smc_ioctl()Ursula Braun1-3/+7
When an SMC socket is connecting it is decided whether fallback to TCP is needed. To avoid races between connect and ioctl move the sock lock before the use_fallback check. Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: allow sysctl rmem and wmem defaults for serversUrsula Braun1-0/+2
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size optimizations for servers should be ignored. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: no shutdown in state SMC_LISTENUrsula Braun1-2/+1
Invoking shutdown for a socket in state SMC_LISTEN does not make sense. Nevertheless programs like syzbot fuzzing the kernel may try to do this. For SMC this means a socket refcounting problem. This patch makes sure a shutdown call for an SMC socket in state SMC_LISTEN simply returns with -ENOTCONN. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net: aquantia: Fix IFF_ALLMULTI flag functionalityDmitry Bogdanov1-1/+1
It was noticed that NIC always pass all multicast traffic to the host regardless of IFF_ALLMULTI flag on the interface. The rule in MC Filter Table in NIC, that is configured to accept any multicast packets, is turning on if IFF_MULTICAST flag is set on the interface. It leads to passing all multicast traffic to the host. This fix changes the condition to turn on that rule by checking IFF_ALLMULTI flag as it should. Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08rxrpc: Fix the keepalive generator [ver #2]David Howells7-89/+109
AF_RXRPC has a keepalive message generator that generates a message for a peer ~20s after the last transmission to that peer to keep firewall ports open. The implementation is incorrect in the following ways: (1) It mixes up ktime_t and time64_t types. (2) It uses ktime_get_real(), the output of which may jump forward or backward due to adjustments to the time of day. (3) If the current time jumps forward too much or jumps backwards, the generator function will crank the base of the time ring round one slot at a time (ie. a 1s period) until it catches up, spewing out VERSION packets as it goes. Fix the problem by: (1) Only using time64_t. There's no need for sub-second resolution. (2) Use ktime_get_seconds() rather than ktime_get_real() so that time isn't perceived to go backwards. (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two parts: (a) The "worker" function that manages the buckets and the timer. (b) The "dispatch" function that takes the pending peers and potentially transmits a keepalive packet before putting them back in the ring into the slot appropriate to the revised last-Tx time. (4) Taking everything that's pending out of the ring and splicing it into a temporary collector list for processing. In the case that there's been a significant jump forward, the ring gets entirely emptied and then the time base can be warped forward before the peers are processed. The warping can't happen if the ring isn't empty because the slot a peer is in is keepalive-time dependent, relative to the base time. (5) Limit the number of iterations of the bucket array when scanning it. (6) Set the timer to skip any empty slots as there's no point waking up if there's nothing to do yet. This can be triggered by an incoming call from a server after a reboot with AF_RXRPC and AFS built into the kernel causing a peer record to be set up before userspace is started. The system clock is then adjusted by userspace, thereby potentially causing the keepalive generator to have a meltdown - which leads to a message like: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23] ... Workqueue: krxrpcd rxrpc_peer_keepalive_worker EIP: lock_acquire+0x69/0x80 ... Call Trace: ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? _raw_spin_lock_bh+0x29/0x60 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? __lock_acquire+0x3d3/0x870 ? process_one_work+0x110/0x340 ? process_one_work+0x166/0x340 ? process_one_work+0x110/0x340 ? worker_thread+0x39/0x3c0 ? kthread+0xdb/0x110 ? cancel_delayed_work+0x90/0x90 ? kthread_stop+0x70/0x70 ? ret_from_fork+0x19/0x24 Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08Merge branch 'mlx5-fixes'David S. Miller3-25/+15
Saeed Mahameed says: ==================== Mellanox, mlx5e fixes 2018-08-07 I know it is late into 4.18 release, and this is why I am submitting only two mlx5e ethernet fixes. The first one from Or, is needed for -stable and it fixes hairpin for "same device" check. The second fix is a non risk fix from Huy which cleans up and improves error return value reporting for dcbnl_ieee_setapp. For -stable v4.16 - net/mlx5e: Properly check if hairpin is possible between two functions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5e: Cleanup of dcbnl related fieldsHuy Nguyen2-21/+11
Remove unused netdev_registered_init/remove in en.h Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails. Remove extra white space Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5e: Properly check if hairpin is possible between two functionsOr Gerlitz1-4/+4
The current check relies on function BDF addresses and can get us wrong e.g when two VFs are assigned into a VM and the PCI v-address is set by the hypervisor. Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Alaa Hleihel <alaa@mellanox.com> Tested-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08parisc: Define mb() and add memory barriers to assembler unlock sequencesJohn David Anglin4-0/+39
For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>