aboutsummaryrefslogtreecommitdiffstats
path: root/mm/truncate.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-06fat: fix uninit-memory access for partial initialized inodeOGAWA Hirofumi1-12/+7
When get an error in the middle of reading an inode, some fields in the inode might be still not initialized. And then the evict_inode path may access those fields via iput(). To fix, this makes sure that inode fields are initialized. Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06mm: avoid data corruption on CoW fault into PFN-mapped VMAKirill A. Shutemov1-8/+27
Jeff Moyer has reported that one of xfstests triggers a warning when run on DAX-enabled filesystem: WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50 ... wp_page_copy+0x98c/0xd50 (unreliable) do_wp_page+0xd8/0xad0 __handle_mm_fault+0x748/0x1b90 handle_mm_fault+0x120/0x1f0 __do_page_fault+0x240/0xd70 do_page_fault+0x38/0xd0 handle_page_fault+0x10/0x30 The warning happens on failed __copy_from_user_inatomic() which tries to copy data into a CoW page. This happens because of race between MADV_DONTNEED and CoW page fault: CPU0 CPU1 handle_mm_fault() do_wp_page() wp_page_copy() do_wp_page() madvise(MADV_DONTNEED) zap_page_range() zap_pte_range() ptep_get_and_clear_full() <TLB flush> __copy_from_user_inatomic() sees empty PTE and fails WARN_ON_ONCE(1) clear_page() The solution is to re-try __copy_from_user_inatomic() under PTL after checking that PTE is matches the orig_pte. The second copy attempt can still fail, like due to non-readable PTE, but there's nothing reasonable we can do about, except clearing the CoW page. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Cc: Justin He <Justin.He@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()Huang Ying1-2/+1
In set_pmd_migration_entry(), pmdp_invalidate() is used to change PMD atomically. But the PMD is read before that with an ordinary memory reading. If the THP (transparent huge page) is written between the PMD reading and pmdp_invalidate(), the PMD dirty bit may be lost, and cause data corruption. The race window is quite small, but still possible in theory, so need to be fixed. The race is fixed via using the return value of pmdp_invalidate() to get the original content of PMD, which is a read/modify/write atomic operation. So no THP writing can occur in between. The race has been introduced when the THP migration support is added in the commit 616b8371539a ("mm: thp: enable thp migration in generic path"). But this fix depends on the commit d52605d7cb30 ("mm: do not lose dirty and accessed bits in pmdp_invalidate()"). So it's easy to be backported after v4.16. But the race window is really small, so it may be fine not to backport the fix at all. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/r/20200220075220.2327056-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numaMel Gorman1-2/+36
: A user reported a bug against a distribution kernel while running a : proprietary workload described as "memory intensive that is not swapping" : that is expected to apply to mainline kernels. The workload is : read/write/modifying ranges of memory and checking the contents. They : reported that within a few hours that a bad PMD would be reported followed : by a memory corruption where expected data was all zeros. A partial : report of the bad PMD looked like : : [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2) : [ 5195.341184] ------------[ cut here ]------------ : [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35! : .... : [ 5195.410033] Call Trace: : [ 5195.410471] [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930 : [ 5195.410716] [<ffffffff811d4be8>] change_prot_numa+0x18/0x30 : [ 5195.410918] [<ffffffff810adefe>] task_numa_work+0x1fe/0x310 : [ 5195.411200] [<ffffffff81098322>] task_work_run+0x72/0x90 : [ 5195.411246] [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2 : [ 5195.411494] [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40 : [ 5195.411739] [<ffffffff815e56af>] retint_user+0x8/0x10 : : Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD : was a false detection. The bug does not trigger if automatic NUMA : balancing or transparent huge pages is disabled. : : The bug is due a race in change_pmd_range between a pmd_trans_huge and : pmd_nond_or_clear_bad check without any locks held. During the : pmd_trans_huge check, a parallel protection update under lock can have : cleared the PMD and filled it with a prot_numa entry between the transhuge : check and the pmd_none_or_clear_bad check. : : While this could be fixed with heavy locking, it's only necessary to make : a copy of the PMD on the stack during change_pmd_range and avoid races. A : new helper is created for this as the check if quite subtle and the : existing similar helpful is not suitable. This passed 154 hours of : testing (usually triggers between 20 minutes and 24 hours) without : detecting bad PMDs or corruption. A basic test of an autonuma-intensive : workload showed no significant change in behaviour. Although Mel withdrew the patch on the face of LKML comment https://lkml.org/lkml/2017/4/10/922 the race window aforementioned is still open, and we have reports of Linpack test reporting bad residuals after the bad PMD warning is observed. In addition to that, bad rss-counter and non-zero pgtables assertions are triggered on mm teardown for the task hitting the bad PMD. host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7) .... host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512 host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096 The issue is observed on a v4.18-based distribution kernel, but the race window is expected to be applicable to mainline kernels, as well. [akpm@linux-foundation.org: fix comment typo, per Rafael] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200216191800.22423-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-05HID: hyperv: NULL check before some freeing functions is not needed.Lucas Tanure1-4/+2
Fix below warnings reported by coccicheck: drivers/hid/hid-hyperv.c:197:2-7: WARNING: NULL check before some freeing functions is not needed. drivers/hid/hid-hyperv.c:211:2-7: WARNING: NULL check before some freeing functions is not needed. Signed-off-by: Lucas Tanure <tanure@linux.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-03-05Hyper-V: add myself as a maintainerWei Liu1-0/+1
Signed-off-by: Wei Liu <wei.liu@kernel.org> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2020-03-05Hyper-V: Drop Sasha Levin from the Hyper-V maintainersSasha Levin1-1/+0
Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-03-03dm: bump version of core and various targetsMike Snitzer7-8/+8
Changes made during the 5.6 cycle warrant bumping the version number for DM core and the targets modified by this commit. It should be noted that dm-thin, dm-crypt and dm-raid already had their target version bumped during the 5.6 merge window. Signed-off-by; Mike Snitzer <snitzer@redhat.com>
2020-03-03dm: fix congested_fn for request-based deviceHou Tao1-11/+10
We neither assign congested_fn for requested-based blk-mq device nor implement it correctly. So fix both. Also, remove incorrect comment from dm_init_normal_md_queue and rename it to dm_init_congested_fn. Fixes: 4aa9c692e052 ("bdi: separate out congested state into a separate struct") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03dm integrity: use dm_bio_record and dm_bio_restoreMike Snitzer1-23/+9
In cases where dec_in_flight() has to requeue the integrity_bio_wait work to transfer the rest of the data, the bio's __bi_remaining might already have been decremented to 0, e.g.: if bio passed to underlying data device was split via blk_queue_split(). Use dm_bio_{record,restore} rather than effectively open-coding them in dm-integrity -- these methods now manage __bi_remaining too. Depends-on: f7f0b057a9c1 ("dm bio record: save/restore bi_end_io and bi_integrity") Reported-by: Daniel Glöckner <dg@emlix.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03dm bio record: save/restore bi_end_io and bi_integrityMike Snitzer1-0/+15
Also, save/restore __bi_remaining in case the bio was used in a BIO_CHAIN (e.g. due to blk_queue_split). Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-01Linux 5.6-rc4Linus Torvalds1-1/+1
2020-03-01KVM: VMX: check descriptor table exits on instruction emulationOliver Upton1-0/+15
KVM emulates UMIP on hardware that doesn't support it by setting the 'descriptor table exiting' VM-execution control and performing instruction emulation. When running nested, this emulation is broken as KVM refuses to emulate L2 instructions by default. Correct this regression by allowing the emulation of descriptor table instructions if L1 hasn't requested 'descriptor table exiting'. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Reported-by: Jan Kiszka <jan.kiszka@web.de> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-29ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()Dan Carpenter1-3/+3
If sbi->s_flex_groups_allocated is zero and the first allocation fails then this code will crash. The problem is that "i--" will set "i" to -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1 is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX is more than zero, the condition is true so we call kvfree(new_groups[-1]). The loop will carry on freeing invalid memory until it crashes. Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-29macintosh: therm_windtunnel: fix regression when instantiating devicesWolfram Sang1-21/+31
Removing attach_adapter from this driver caused a regression for at least some machines. Those machines had the sensors described in their DT, too, so they didn't need manual creation of the sensor devices. The old code worked, though, because manual creation came first. Creation of DT devices then failed later and caused error logs, but the sensors worked nonetheless because of the manually created devices. When removing attach_adaper, manual creation now comes later and loses the race. The sensor devices were already registered via DT, yet with another binding, so the driver could not be bound to it. This fix refactors the code to remove the race and only manually creates devices if there are no DT nodes present. Also, the DT binding is updated to match both, the DT and manually created devices. Because we don't know which device creation will be used at runtime, the code to start the kthread is moved to do_probe() which will be called by both methods. Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723 Reported-by: Erhard Furtner <erhard_f@mailbox.org> Tested-by: Erhard Furtner <erhard_f@mailbox.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org # v4.19+
2020-02-29jbd2: fix data races at struct journal_headQian Cai1-4/+4
journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-29x86/mm: Fix dump_pagetables with Xen PVJuergen Gross1-6/+1
Commit 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") broke Xen PV guests as the hypervisor reserved hole in the memory map was not taken into account. Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR. Fixes: 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Julien Grall <julien@xen.org> Link: https://lkml.kernel.org/r/20200221103851.7855-1-jgross@suse.com
2020-02-29x86/ioperm: Add new paravirt function update_io_bitmap()Juergen Gross6-2/+50
Commit 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") reworked the iopl syscall to use I/O bitmaps. Unfortunately this broke Xen PV domains using that syscall as there is currently no I/O bitmap support in PV domains. Add I/O bitmap support via a new paravirt function update_io_bitmap which Xen PV domains can use to update their I/O bitmaps via a hypercall. Fixes: 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.5 Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-28kvm: x86: Limit the number of "kvm: disabled by bios" messagesErwan Velu1-2/+2
In older version of systemd(219), at boot time, udevadm is called with : /usr/bin/udevadm trigger --type=devices --action=add" This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent, leading to the "kvm: disabled by bios" message in case of your Bios disabled the virtualization extensions. On a modern system running up to 256 CPU threads, this pollutes the Kernel logs. This patch offers to ratelimit this message to avoid any userspace program triggering this uevent printing this message too often. This patch is only a workaround but greatly reduce the pollution without breaking the current behavior of printing a message if some try to instantiate KVM on a system that doesn't support it. Note that recent versions of systemd (>239) do not have trigger this behavior. This patch will be useful at least for some using older systemd with recent Kernels. Signed-off-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: x86: avoid useless copy of cpufreq policyPaolo Bonzini1-5/+5
struct cpufreq_policy is quite big and it is not a good idea to allocate one on the stack. Just use cpufreq_cpu_get and cpufreq_cpu_put which is even simpler. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: allow disabling -WerrorPaolo Bonzini2-1/+14
Restrict -Werror to well-tested configurations and allow disabling it via Kconfig. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: x86: allow compiling as non-module with W=1Valdis Klētnieks2-0/+4
Compile error with CONFIG_KVM_INTEL=y and W=1: CC arch/x86/kvm/vmx/vmx.o arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=] 68 | static const struct x86_cpu_id vmx_cpu_id[] = { | ^~~~~~~~~~ cc1: all warnings being treated as errors When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a reference to the structure (or any code at all). This makes W=1 compiles unhappy. Wrap both in a #ifdef to avoid the issue. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> [Do the same for CONFIG_KVM_AMD. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipisWanpeng Li1-12/+21
Nick Desaulniers Reported: When building with: $ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000 The following warning is observed: arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=] static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector) ^ Debugging with: https://github.com/ClangBuiltLinux/frame-larger-than via: $ python3 frame_larger_than.py arch/x86/kernel/kvm.o \ kvm_send_ipi_mask_allbutself points to the stack allocated `struct cpumask newmask` in `kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as 8192, making a single instance of a `struct cpumask` 1024 B. This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for both pv tlb and pv ipis.. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: Introduce pv check helpersWanpeng Li1-10/+24
Introduce some pv check helpers for consistency. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: let declaration of kvm_get_running_vcpus match implementationChristian Borntraeger1-1/+1
Sparse notices that declaration and implementation do not match: arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces) arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> ** arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu *[noderef] <asn:3> * Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: SVM: allocate AVIC data structures based on kvm_amd module parameterPaolo Bonzini1-1/+2
Even if APICv is disabled at startup, the backing page and ir_list need to be initialized in case they are needed later. The only case in which this can be skipped is for userspace irqchip, and that must be done because avic_init_backing_page dereferences vcpu->arch.apic (which is NULL for userspace irqchip). Tested-by: rmuncrief@humanavance.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-27MAINTAINERS: Correct Cadence PCI driver pathLukas Bulwahn1-1/+1
de80f95ccb9c ("PCI: cadence: Move all files to per-device cadence directory") moved files of the PCI cadence drivers, but did not update the MAINTAINERS entry. Since then, ./scripts/get_maintainer.pl --self-test complains: warning: no file matches F: drivers/pci/controller/pcie-cadence* Repair the MAINTAINERS entry. Link: https://lore.kernel.org/r/20200221185402.4703-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-02-27dm zoned: Fix reference counter initial value of chunk worksShin'ichiro Kawasaki1-4/+4
Dm-zoned initializes reference counters of new chunk works with zero value and refcount_inc() is called to increment the counter. However, the refcount_inc() function handles the addition to zero value as an error and triggers the warning as follows: refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0 ... CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134 ... Call Trace: dmz_map+0x2d2/0x350 [dm_zoned] __map_bio+0x42/0x1a0 __split_and_process_non_flush+0x14a/0x1b0 __split_and_process_bio+0x83/0x240 ? kmem_cache_alloc+0x165/0x220 dm_process_bio+0x90/0x230 ? generic_make_request_checks+0x2e7/0x680 dm_make_request+0x3e/0xb0 generic_make_request+0xcf/0x320 ? memcg_drain_all_list_lrus+0x1c0/0x1c0 submit_bio+0x3c/0x160 ? guard_bio_eod+0x2c/0x130 mpage_readpages+0x182/0x1d0 ? bdev_evict_inode+0xf0/0xf0 read_pages+0x6b/0x1b0 __do_page_cache_readahead+0x1ba/0x1d0 force_page_cache_readahead+0x93/0x100 generic_file_read_iter+0x83a/0xe40 ? __seccomp_filter+0x7b/0x670 new_sync_read+0x12a/0x1c0 vfs_read+0x9d/0x150 ksys_read+0x5f/0xe0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... After this warning, following refcount API calls for the counter all fail to change the counter value. Fix this by setting the initial reference counter value not zero but one for the new chunk works. Instead, do not call refcount_inc() via dmz_get_chunk_work() for the new chunks works. The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL enabled. Refcount rework was merged to linux version 5.5 by the commit 168829ad09ca ("Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed regardless of kernel configuration. Linux version 4.20 merged the commit 092b5648760a ("dm zoned: target: use refcount_t for dm zoned reference counters"). Before this commit, dm zoned used atomic_t APIs which does not check addition to zero, then this fix is not necessary. Fixes: 092b5648760a ("dm zoned: target: use refcount_t for dm zoned reference counters") Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-02-27dm writecache: verify watermark during resumeMikulas Patocka1-2/+10
Verify the watermark upon resume - so that if the target is reloaded with lower watermark, it will start the cleanup process immediately. Fixes: 48debafe4f2f ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-02-27dm: report suspended device during destroyMikulas Patocka3-8/+7
The function dm_suspended returns true if the target is suspended. However, when the target is being suspended during unload, it returns false. An example where this is a problem: the test "!dm_suspended(wc->ti)" in writecache_writeback is not sufficient, because dm_suspended returns zero while writecache_suspend is in progress. As is, without an enhanced dm_suspended, simply switching from flush_workqueue to drain_workqueue still emits warnings: workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function flushes the current work. However, the workqueue may re-queue itself and flush_workqueue doesn't wait for re-queued works to finish. Because of this - the function writecache_writeback continues execution after the device was suspended and then concurrently with writecache_dtr, causing a crash in writecache_writeback. We must use drain_workqueue - that waits until the work and all re-queued works finish. As a prereq for switching to drain_workqueue, this commit fixes dm_suspended to return true after the presuspend hook and before the postsuspend hook - just like during a normal suspend. It allows simplifying the dm-integrity and dm-writecache targets so that they don't have to maintain suspended flags on their own. With this change use of drain_workqueue() can be used effectively. This change was tested with the lvm2 testsuite and cryptsetup testsuite and the are no regressions. Fixes: 48debafe4f2f ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Reported-by: Corey Marthaler <cmarthal@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-02-27io_uring: fix 32-bit compatability with sendmsg/recvmsgJens Axboe1-0/+10
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise the iovec import for these commands will not do the right thing and fail the command with -EINVAL. Found by running the test suite compiled as 32-bit. Cc: stable@vger.kernel.org Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()") Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-27net: dsa: mv88e6xxx: Fix masking of egress portAndrew Lunn1-2/+2
Add missing ~ to the usage of the mask. Reported-by: Kevin Benson <Kevin.Benson@zii.aero> Reported-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 5c74c54ce6ff ("net: dsa: mv88e6xxx: Split monitor port configuration") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27mlxsw: pci: Wait longer before accessing the device after resetAmit Cohen1-1/+1
During initialization the driver issues a reset to the device and waits for 100ms before checking if the firmware is ready. The waiting is necessary because before that the device is irresponsive and the first read can result in a completion timeout. While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is insufficient for Spectrum-3. Fix this by increasing the timeout to 200ms. Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27sfc: fix timestamp reconstruction at 16-bit rollover pointsAlex Maftei (amaftei)1-3/+35
We can't just use the top bits of the last sync event as they could be off-by-one every 65,536 seconds, giving an error in reconstruction of 65,536 seconds. This patch uses the difference in the bottom 16 bits (mod 2^16) to calculate an offset that needs to be applied to the last sync event to get to the current time. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27vsock: fix potential deadlock in transport->release()Stefano Garzarella3-13/+12
Some transports (hyperv, virtio) acquire the sock lock during the .release() callback. In the vsock_stream_connect() we call vsock_assign_transport(); if the socket was previously assigned to another transport, the vsk->transport->release() is called, but the sock lock is already held in the vsock_stream_connect(), causing a deadlock reported by syzbot: INFO: task syz-executor280:9768 blocked for more than 143 seconds. Not tainted 5.6.0-rc1-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor280 D27912 9768 9766 0x00000000 Call Trace: context_switch kernel/sched/core.c:3386 [inline] __schedule+0x934/0x1f90 kernel/sched/core.c:4082 schedule+0xdc/0x2b0 kernel/sched/core.c:4156 __lock_sock+0x165/0x290 net/core/sock.c:2413 lock_sock_nested+0xfe/0x120 net/core/sock.c:2938 virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832 vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454 vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288 __sys_connect_file+0x161/0x1c0 net/socket.c:1857 __sys_connect+0x174/0x1b0 net/socket.c:1874 __do_sys_connect net/socket.c:1885 [inline] __se_sys_connect net/socket.c:1882 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1882 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe To avoid this issue, this patch remove the lock acquiring in the .release() callback of hyperv and virtio transports, and it holds the lock when we call vsk->transport->release() in the vsock core. Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27unix: It's CONFIG_PROC_FS not CONFIG_PROCFSDavid S. Miller1-1/+1
Fixes: 3a12500ed5dd ("unix: define and set show_fdinfo only if procfs is enabled") Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: fix packet forwarding in rmnet bridge modeTaehee Yoo1-0/+3
Packet forwarding is not working in rmnet bridge mode. Because when a packet is forwarded, skb_push() for an ethernet header is needed. But it doesn't call skb_push(). So, the ethernet header will be lost. Test commands: modprobe rmnet ip netns add nst ip netns add nst2 ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link set veth1 netns nst ip link set veth3 netns nst2 ip link add rmnet0 link veth0 type rmnet mux_id 1 ip link set veth2 master rmnet0 ip link set veth0 up ip link set veth2 up ip link set rmnet0 up ip a a 192.168.100.1/24 dev rmnet0 ip netns exec nst ip link set veth1 up ip netns exec nst ip a a 192.168.100.2/24 dev veth1 ip netns exec nst2 ip link set veth3 up ip netns exec nst2 ip a a 192.168.100.3/24 dev veth3 ip netns exec nst2 ping 192.168.100.2 Fixes: 60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: fix bridge mode bugsTaehee Yoo4-77/+64
In order to attach a bridge interface to the rmnet interface, "master" operation is used. (e.g. ip link set dummy1 master rmnet0) But, in the rmnet_add_bridge(), which is a callback of ->ndo_add_slave() doesn't register lower interface. So, ->ndo_del_slave() doesn't work. There are other problems too. 1. It couldn't detect circular upper/lower interface relationship. 2. It couldn't prevent stack overflow because of too deep depth of upper/lower interface 3. It doesn't check the number of lower interfaces. 4. Panics because of several reasons. The root problem of these issues is actually the same. So, in this patch, these all problems will be fixed. Test commands: modprobe rmnet ip link add dummy0 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link add dummy1 master rmnet0 type dummy ip link add dummy2 master rmnet0 type dummy ip link del rmnet0 ip link del dummy2 ip link del dummy1 Splat looks like: [ 41.867595][ T1164] general protection fault, probably for non-canonical address 0xdffffc0000000101I [ 41.869993][ T1164] KASAN: null-ptr-deref in range [0x0000000000000808-0x000000000000080f] [ 41.872950][ T1164] CPU: 0 PID: 1164 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 41.873915][ T1164] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 41.875161][ T1164] RIP: 0010:rmnet_unregister_bridge.isra.6+0x71/0xf0 [rmnet] [ 41.876178][ T1164] Code: 48 89 ef 48 89 c6 5b 5d e9 fc fe ff ff e8 f7 f3 ff ff 48 8d b8 08 08 00 00 48 ba 00 7 [ 41.878925][ T1164] RSP: 0018:ffff8880c4d0f188 EFLAGS: 00010202 [ 41.879774][ T1164] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000101 [ 41.887689][ T1164] RDX: dffffc0000000000 RSI: ffffffffb8cf64f0 RDI: 0000000000000808 [ 41.888727][ T1164] RBP: ffff8880c40e4000 R08: ffffed101b3c0e3c R09: 0000000000000001 [ 41.889749][ T1164] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 1ffff110189a1e3c [ 41.890783][ T1164] R13: ffff8880c4d0f200 R14: ffffffffb8d56160 R15: ffff8880ccc2c000 [ 41.891794][ T1164] FS: 00007f4300edc0c0(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000 [ 41.892953][ T1164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.893800][ T1164] CR2: 00007f43003bc8c0 CR3: 00000000ca53e001 CR4: 00000000000606f0 [ 41.894824][ T1164] Call Trace: [ 41.895274][ T1164] ? rcu_is_watching+0x2c/0x80 [ 41.895895][ T1164] rmnet_config_notify_cb+0x1f7/0x590 [rmnet] [ 41.896687][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet] [ 41.897611][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet] [ 41.898508][ T1164] ? __module_text_address+0x13/0x140 [ 41.899162][ T1164] notifier_call_chain+0x90/0x160 [ 41.899814][ T1164] rollback_registered_many+0x660/0xcf0 [ 41.900544][ T1164] ? netif_set_real_num_tx_queues+0x780/0x780 [ 41.901316][ T1164] ? __lock_acquire+0xdfe/0x3de0 [ 41.901958][ T1164] ? memset+0x1f/0x40 [ 41.902468][ T1164] ? __nla_validate_parse+0x98/0x1ab0 [ 41.903166][ T1164] unregister_netdevice_many.part.133+0x13/0x1b0 [ 41.903988][ T1164] rtnl_delete_link+0xbc/0x100 [ ... ] Fixes: 60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: use upper/lower device infrastructureTaehee Yoo1-19/+16
netdev_upper_dev_link() is useful to manage lower/upper interfaces. And this function internally validates looping, maximum depth. All or most virtual interfaces that could have a real interface (e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure. Test commands: modprobe rmnet ip link add dummy0 type dummy ip link add rmnet1 link dummy0 type rmnet mux_id 1 for i in {2..100} do let A=$i-1 ip link add rmnet$i link rmnet$A type rmnet mux_id $i done ip link del dummy0 The purpose of the test commands is to make stack overflow. Splat looks like: [ 52.411438][ T1395] BUG: KASAN: slab-out-of-bounds in find_busiest_group+0x27e/0x2c00 [ 52.413218][ T1395] Write of size 64 at addr ffff8880c774bde0 by task ip/1395 [ 52.414841][ T1395] [ 52.430720][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 52.496511][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 52.513597][ T1395] Call Trace: [ 52.546516][ T1395] [ 52.558773][ T1395] Allocated by task 3171537984: [ 52.588290][ T1395] BUG: unable to handle page fault for address: ffffffffb999e260 [ 52.589311][ T1395] #PF: supervisor read access in kernel mode [ 52.590529][ T1395] #PF: error_code(0x0000) - not-present page [ 52.591374][ T1395] PGD d6818067 P4D d6818067 PUD d6819063 PMD 0 [ 52.592288][ T1395] Thread overran stack, or stack corrupted [ 52.604980][ T1395] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 52.605856][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 52.611764][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 52.621520][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30 [ 52.622296][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0 [ 52.627887][ T1395] RSP: 0018:ffff8880c774bb60 EFLAGS: 00010006 [ 52.628735][ T1395] RAX: 00000000001f8880 RBX: ffff8880c774d140 RCX: 0000000000000000 [ 52.631773][ T1395] RDX: 000000000000001d RSI: ffff8880c774bb68 RDI: 0000000000003ff0 [ 52.649584][ T1395] RBP: ffffea00031dd200 R08: ffffed101b43e403 R09: ffffed101b43e403 [ 52.674857][ T1395] R10: 0000000000000001 R11: ffffed101b43e402 R12: ffff8880d900e5c0 [ 52.678257][ T1395] R13: ffff8880c774c000 R14: 0000000000000000 R15: dffffc0000000000 [ 52.694541][ T1395] FS: 00007fe867f6e0c0(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000 [ 52.764039][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 52.815008][ T1395] CR2: ffffffffb999e260 CR3: 00000000c26aa005 CR4: 00000000000606e0 [ 52.862312][ T1395] Call Trace: [ 52.887133][ T1395] Modules linked in: dummy rmnet veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_dex [ 52.936749][ T1395] CR2: ffffffffb999e260 [ 52.965695][ T1395] ---[ end trace 7e32ca99482dbb31 ]--- [ 52.966556][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30 [ 52.971083][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0 [ 53.003650][ T1395] RSP: 0018:ffff8880c774bb60 EFLAGS: 00010006 [ 53.043183][ T1395] RAX: 00000000001f8880 RBX: ffff8880c774d140 RCX: 0000000000000000 [ 53.076480][ T1395] RDX: 000000000000001d RSI: ffff8880c774bb68 RDI: 0000000000003ff0 [ 53.093858][ T1395] RBP: ffffea00031dd200 R08: ffffed101b43e403 R09: ffffed101b43e403 [ 53.112795][ T1395] R10: 0000000000000001 R11: ffffed101b43e402 R12: ffff8880d900e5c0 [ 53.139837][ T1395] R13: ffff8880c774c000 R14: 0000000000000000 R15: dffffc0000000000 [ 53.141500][ T1395] FS: 00007fe867f6e0c0(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000 [ 53.143343][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.152007][ T1395] CR2: ffffffffb999e260 CR3: 00000000c26aa005 CR4: 00000000000606e0 [ 53.156459][ T1395] Kernel panic - not syncing: Fatal exception [ 54.213570][ T1395] Shutting down cpus with NMI [ 54.354112][ T1395] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x) [ 54.355687][ T1395] Rebooting in 5 seconds.. Fixes: b37f78f234bf ("net: qualcomm: rmnet: Fix crash on real dev unregistration") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: do not allow to change mux id if mux id is duplicatedTaehee Yoo1-0/+4
Basically, duplicate mux id isn't be allowed. So, the creation of rmnet will be failed if there is duplicate mux id is existing. But, changelink routine doesn't check duplicate mux id. Test commands: modprobe rmnet ip link add dummy0 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link add rmnet1 link dummy0 type rmnet mux_id 2 ip link set rmnet1 type rmnet mux_id 1 Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()Taehee Yoo1-2/+0
The notifier_call() of the slave interface removes rmnet interface with unregister_netdevice_queue(). But, before calling unregister_netdevice_queue(), it acquires rcu readlock. In the RCU critical section, sleeping isn't be allowed. But, unregister_netdevice_queue() internally calls synchronize_net(), which would sleep. So, suspicious RCU usage warning occurs. Test commands: modprobe rmnet ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set dummy1 master rmnet0 ip link del dummy0 Splat looks like: [ 79.639245][ T1195] ============================= [ 79.640134][ T1195] WARNING: suspicious RCU usage [ 79.640852][ T1195] 5.6.0-rc1+ #447 Not tainted [ 79.641657][ T1195] ----------------------------- [ 79.642472][ T1195] ./include/linux/rcupdate.h:273 Illegal context switch in RCU read-side critical section! [ 79.644043][ T1195] [ 79.644043][ T1195] other info that might help us debug this: [ 79.644043][ T1195] [ 79.645682][ T1195] [ 79.645682][ T1195] rcu_scheduler_active = 2, debug_locks = 1 [ 79.646980][ T1195] 2 locks held by ip/1195: [ 79.647629][ T1195] #0: ffffffffa3cf64f0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890 [ 79.649312][ T1195] #1: ffffffffa39256c0 (rcu_read_lock){....}, at: rmnet_config_notify_cb+0xf0/0x590 [rmnet] [ 79.651717][ T1195] [ 79.651717][ T1195] stack backtrace: [ 79.652650][ T1195] CPU: 3 PID: 1195 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 79.653702][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 79.655037][ T1195] Call Trace: [ 79.655560][ T1195] dump_stack+0x96/0xdb [ 79.656252][ T1195] ___might_sleep+0x345/0x440 [ 79.656994][ T1195] synchronize_net+0x18/0x30 [ 79.661132][ T1195] netdev_rx_handler_unregister+0x40/0xb0 [ 79.666266][ T1195] rmnet_unregister_real_device+0x42/0xb0 [rmnet] [ 79.667211][ T1195] rmnet_config_notify_cb+0x1f7/0x590 [rmnet] [ 79.668121][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet] [ 79.669166][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet] [ 79.670286][ T1195] ? __module_text_address+0x13/0x140 [ 79.671139][ T1195] notifier_call_chain+0x90/0x160 [ 79.671973][ T1195] rollback_registered_many+0x660/0xcf0 [ 79.672893][ T1195] ? netif_set_real_num_tx_queues+0x780/0x780 [ 79.675091][ T1195] ? __lock_acquire+0xdfe/0x3de0 [ 79.675825][ T1195] ? memset+0x1f/0x40 [ 79.676367][ T1195] ? __nla_validate_parse+0x98/0x1ab0 [ 79.677290][ T1195] unregister_netdevice_many.part.133+0x13/0x1b0 [ 79.678163][ T1195] rtnl_delete_link+0xbc/0x100 [ ... ] Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: fix suspicious RCU usageTaehee Yoo3-10/+9
rmnet_get_port() internally calls rcu_dereference_rtnl(), which checks RTNL. But rmnet_get_port() could be called by packet path. The packet path is not protected by RTNL. So, the suspicious RCU usage problem occurs. Test commands: modprobe rmnet ip netns add nst ip link add veth0 type veth peer name veth1 ip link set veth1 netns nst ip link add rmnet0 link veth0 type rmnet mux_id 1 ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1 ip netns exec nst ip link set veth1 up ip netns exec nst ip link set rmnet1 up ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1 ip link set veth0 up ip link set rmnet0 up ip a a 192.168.100.1/24 dev rmnet0 ping 192.168.100.2 Splat looks like: [ 146.630958][ T1174] WARNING: suspicious RCU usage [ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted [ 146.632387][ T1174] ----------------------------- [ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() ! [ 146.634742][ T1174] [ 146.634742][ T1174] other info that might help us debug this: [ 146.634742][ T1174] [ 146.645992][ T1174] [ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1 [ 146.646937][ T1174] 5 locks held by ping/1174: [ 146.647609][ T1174] #0: ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980 [ 146.662463][ T1174] #1: ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150 [ 146.671696][ T1174] #2: ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940 [ 146.673064][ T1174] #3: ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150 [ 146.690358][ T1174] #4: ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020 [ 146.699875][ T1174] [ 146.699875][ T1174] stack backtrace: [ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447 [ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 146.706565][ T1174] Call Trace: [ 146.707102][ T1174] dump_stack+0x96/0xdb [ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet] [ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet] [ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020 [ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet] [ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740 [ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020 [ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0 [ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0 [ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940 [ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0 [ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940 [ ... ] Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: fix NULL pointer dereference in rmnet_changelink()Taehee Yoo1-4/+2
In the rmnet_changelink(), it uses IFLA_LINK without checking NULL pointer. tb[IFLA_LINK] could be NULL pointer. So, NULL-ptr-deref could occur. rmnet already has a lower interface (real_dev). So, after this patch, rmnet_changelink() does not use IFLA_LINK anymore. Test commands: modprobe rmnet ip link add dummy0 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set rmnet0 type rmnet mux_id 2 Splat looks like: [ 90.578726][ T1131] general protection fault, probably for non-canonical address 0xdffffc0000000000I [ 90.581121][ T1131] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 90.582380][ T1131] CPU: 2 PID: 1131 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 90.584285][ T1131] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 90.587506][ T1131] RIP: 0010:rmnet_changelink+0x5a/0x8a0 [rmnet] [ 90.588546][ T1131] Code: 83 ec 20 48 c1 ea 03 80 3c 02 00 0f 85 6f 07 00 00 48 8b 5e 28 48 b8 00 00 00 00 00 0 [ 90.591447][ T1131] RSP: 0018:ffff8880ce78f1b8 EFLAGS: 00010247 [ 90.592329][ T1131] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff8880ce78f8b0 [ 90.593253][ T1131] RDX: 0000000000000000 RSI: ffff8880ce78f4a0 RDI: 0000000000000004 [ 90.594058][ T1131] RBP: ffff8880cf543e00 R08: 0000000000000002 R09: 0000000000000002 [ 90.594859][ T1131] R10: ffffffffc0586a40 R11: 0000000000000000 R12: ffff8880ca47c000 [ 90.595690][ T1131] R13: ffff8880ca47c000 R14: ffff8880cf545000 R15: 0000000000000000 [ 90.596553][ T1131] FS: 00007f21f6c7e0c0(0000) GS:ffff8880da400000(0000) knlGS:0000000000000000 [ 90.597504][ T1131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 90.599418][ T1131] CR2: 0000556e413db458 CR3: 00000000c917a002 CR4: 00000000000606e0 [ 90.600289][ T1131] Call Trace: [ 90.600631][ T1131] __rtnl_newlink+0x922/0x1270 [ 90.601194][ T1131] ? lock_downgrade+0x6e0/0x6e0 [ 90.601724][ T1131] ? rtnl_link_unregister+0x220/0x220 [ 90.602309][ T1131] ? lock_acquire+0x164/0x3b0 [ 90.602784][ T1131] ? is_bpf_image_address+0xff/0x1d0 [ 90.603331][ T1131] ? rtnl_newlink+0x4c/0x90 [ 90.603810][ T1131] ? kernel_text_address+0x111/0x140 [ 90.604419][ T1131] ? __kernel_text_address+0xe/0x30 [ 90.604981][ T1131] ? unwind_get_return_address+0x5f/0xa0 [ 90.605616][ T1131] ? create_prof_cpu_mask+0x20/0x20 [ 90.606304][ T1131] ? arch_stack_walk+0x83/0xb0 [ 90.606985][ T1131] ? stack_trace_save+0x82/0xb0 [ 90.607656][ T1131] ? stack_trace_consume_entry+0x160/0x160 [ 90.608503][ T1131] ? deactivate_slab.isra.78+0x2c5/0x800 [ 90.609336][ T1131] ? kasan_unpoison_shadow+0x30/0x40 [ 90.610096][ T1131] ? kmem_cache_alloc_trace+0x135/0x350 [ 90.610889][ T1131] ? rtnl_newlink+0x4c/0x90 [ 90.611512][ T1131] rtnl_newlink+0x65/0x90 [ ... ] Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: rmnet: fix NULL pointer dereference in rmnet_newlink()Taehee Yoo1-0/+5
rmnet registers IFLA_LINK interface as a lower interface. But, IFLA_LINK could be NULL. In the current code, rmnet doesn't check IFLA_LINK. So, panic would occur. Test commands: modprobe rmnet ip link add rmnet0 type rmnet mux_id 1 Splat looks like: [ 36.826109][ T1115] general protection fault, probably for non-canonical address 0xdffffc0000000000I [ 36.838817][ T1115] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 36.839908][ T1115] CPU: 1 PID: 1115 Comm: ip Not tainted 5.6.0-rc1+ #447 [ 36.840569][ T1115] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 36.841408][ T1115] RIP: 0010:rmnet_newlink+0x54/0x510 [rmnet] [ 36.841986][ T1115] Code: 83 ec 18 48 c1 e9 03 80 3c 01 00 0f 85 d4 03 00 00 48 8b 6a 28 48 b8 00 00 00 00 00 c [ 36.843923][ T1115] RSP: 0018:ffff8880b7e0f1c0 EFLAGS: 00010247 [ 36.844756][ T1115] RAX: dffffc0000000000 RBX: ffff8880d14cca00 RCX: 1ffff11016fc1e99 [ 36.845859][ T1115] RDX: 0000000000000000 RSI: ffff8880c3d04000 RDI: 0000000000000004 [ 36.846961][ T1115] RBP: 0000000000000000 R08: ffff8880b7e0f8b0 R09: ffff8880b6ac2d90 [ 36.848020][ T1115] R10: ffffffffc0589a40 R11: ffffed1016d585b7 R12: ffffffff88ceaf80 [ 36.848788][ T1115] R13: ffff8880c3d04000 R14: ffff8880b7e0f8b0 R15: ffff8880c3d04000 [ 36.849546][ T1115] FS: 00007f50ab3360c0(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000 [ 36.851784][ T1115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.852422][ T1115] CR2: 000055871afe5ab0 CR3: 00000000ae246001 CR4: 00000000000606e0 [ 36.853181][ T1115] Call Trace: [ 36.853514][ T1115] __rtnl_newlink+0xbdb/0x1270 [ 36.853967][ T1115] ? lock_downgrade+0x6e0/0x6e0 [ 36.854420][ T1115] ? rtnl_link_unregister+0x220/0x220 [ 36.854936][ T1115] ? lock_acquire+0x164/0x3b0 [ 36.855376][ T1115] ? is_bpf_image_address+0xff/0x1d0 [ 36.855884][ T1115] ? rtnl_newlink+0x4c/0x90 [ 36.856304][ T1115] ? kernel_text_address+0x111/0x140 [ 36.856857][ T1115] ? __kernel_text_address+0xe/0x30 [ 36.857440][ T1115] ? unwind_get_return_address+0x5f/0xa0 [ 36.858063][ T1115] ? create_prof_cpu_mask+0x20/0x20 [ 36.858644][ T1115] ? arch_stack_walk+0x83/0xb0 [ 36.859171][ T1115] ? stack_trace_save+0x82/0xb0 [ 36.859710][ T1115] ? stack_trace_consume_entry+0x160/0x160 [ 36.860357][ T1115] ? deactivate_slab.isra.78+0x2c5/0x800 [ 36.860928][ T1115] ? kasan_unpoison_shadow+0x30/0x40 [ 36.861520][ T1115] ? kmem_cache_alloc_trace+0x135/0x350 [ 36.862125][ T1115] ? rtnl_newlink+0x4c/0x90 [ 36.864073][ T1115] rtnl_newlink+0x65/0x90 [ ... ] Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27net: phy: marvell: don't interpret PHY status unless resolvedRussell King1-0/+5
Don't attempt to interpret the PHY specific status register unless the PHY is indicating that the resolution is valid. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27mlx5: register lag notifier for init network namespace onlyJiri Pirko5-14/+6
The current code causes problems when the unregistering netdevice could be different then the registering one. Since the check in mlx5_lag_netdev_event() does not allow any other network namespace anyway, fix this by registerting the lag notifier per init network namespace only. Fixes: d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Aya Levin <ayal@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27unix: define and set show_fdinfo only if procfs is enabledTobias Klauser1-0/+4
Follow the pattern used with other *_show_fdinfo functions and only define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS is set. Fixes: 3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27hinic: fix a bug of rss configurationLuo bin1-1/+2
should use real receive queue number to configure hw rss indirect table rather than maximal queue number Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27hinic: fix a bug of setting hw_ioctxtLuo bin3-1/+3
a reserved field is used to signify prime physical function index in the latest firmware version, so we must assign a value to it correctly Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27hinic: fix a irq affinity bugLuo bin2-3/+3
can not use a local variable as an input parameter of irq_set_affinity_hint Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>