aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-04-05paride/pcd: Fix potential NULL pointer dereference and mem leakYueHaibing1-1/+13
Syzkaller report this: pcd: pcd version 1.07, major 46, nice 0 pcd0: Autoprobe failed pcd: No CD-ROM drive found kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 4525 Comm: syz-executor.0 Not tainted 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:pcd_init+0x95c/0x1000 [pcd] Code: c4 ab f7 48 89 d8 48 c1 e8 03 80 3c 28 00 74 08 48 89 df e8 56 a3 da f7 4c 8b 23 49 8d bc 24 80 05 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 74 05 e8 39 a3 da f7 49 8b bc 24 80 05 00 00 e8 cc b2 RSP: 0018:ffff8881e84df880 EFLAGS: 00010202 RAX: 00000000000000b0 RBX: ffffffffc155a088 RCX: ffffffffc1508935 RDX: 0000000000040000 RSI: ffffc900014f0000 RDI: 0000000000000580 RBP: dffffc0000000000 R08: ffffed103ee658b8 R09: ffffed103ee658b8 R10: 0000000000000001 R11: ffffed103ee658b7 R12: 0000000000000000 R13: ffffffffc155a778 R14: ffffffffc155a4a8 R15: 0000000000000003 FS: 00007fe71bee3700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a7334441a8 CR3: 00000001e9674003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? 0xffffffffc1508000 ? 0xffffffffc1508000 do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fe71bee2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007fe71bee2c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe71bee36bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Modules linked in: pcd(+) paride solos_pci atm ts_fsm rtc_mt6397 mac80211 nhc_mobility nhc_udp nhc_ipv6 nhc_hop nhc_dest nhc_fragment nhc_routing 6lowpan rtc_cros_ec memconsole intel_xhci_usb_role_switch roles rtc_wm8350 usbcore industrialio_triggered_buffer kfifo_buf industrialio asc7621 dm_era dm_persistent_data dm_bufio dm_mod tpm gnss_ubx gnss_serial serdev gnss max2165 cpufreq_dt hid_penmount hid menf21bmc_wdt rc_core n_tracesink ide_gd_mod cdns_csi2tx v4l2_fwnode videodev media pinctrl_lewisburg pinctrl_intel iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd ide_pci_generic piix input_leds cryptd glue_helper psmouse ide_core intel_agp serio_raw intel_gtt ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: bmc150_magn] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace d873691c3cd69f56 ]--- If alloc_disk fails in pcd_init_units, cd->disk will be NULL, however in pcd_detect and pcd_exit, it's not check this before free.It may result a NULL pointer dereference. Also when register_blkdev failed, blk_cleanup_queue() and blk_mq_free_tag_set() should be called to free resources. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 81b74ac68c28 ("paride/pcd: cleanup queues when detection fails") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-05syscalls: Remove start and number from syscall_set_arguments() argsSteven Rostedt (VMware)20-304/+88
After removing the start and count arguments of syscall_get_arguments() it seems reasonable to remove them from syscall_set_arguments(). Note, as of today, there are no users of syscall_set_arguments(). But we are told that there will be soon. But for now, at least make it consistent with syscall_get_arguments(). Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05syscalls: Remove start and number from syscall_get_arguments() argsSteven Rostedt (Red Hat)29-378/+113
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the function call syscall_get_arguments() implemented in x86 was horribly written and not optimized for the standard case of passing in 0 and 6 for the starting index and the number of system calls to get. When looking at all the users of this function, I discovered that all instances pass in only 0 and 6 for these arguments. Instead of having this function handle different cases that are never used, simply rewrite it to return the first 6 arguments of a system call. This should help out the performance of tracing system calls by ptrace, ftrace and perf. Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05xen: Prevent buffer overflow in privcmd ioctlDan Carpenter1-0/+3
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access. Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-04-05xen: use struct_size() helper in kzalloc()Andrea Righi1-2/+1
struct privcmd_buf_vma_private has a zero-sized array at the end (pages), use the new struct_size() helper to determine the proper allocation size and avoid potential type mistakes. Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-04-04ibmvnic: Fix completion structure initializationThomas Falcon1-2/+3
Fix device initialization completion handling for vNIC adapters. Initialize the completion structure on probe and reinitialize when needed. This also fixes a race condition during kdump where the driver can attempt to access the completion struct before it is initialized: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000081acbe0 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1 NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900 REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288 XER: 00000006 CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58 GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4 GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28 GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100 GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006 GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000 GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003 GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58 NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100 LR [c0000000081ad964] complete+0x64/0xa0 Call Trace: [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable) [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0 [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic] [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic] [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0 [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400 [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0 [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210 [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24 [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130 [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120 Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04ipv6: sit: reset ip header pointer in ipip6_rcvLorenzo Bianconi1-0/+4
ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device) [ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18 [ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc [ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018 [ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected Fix it resetting iph pointer after iptunnel_pull_header Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: bridge: always clear mcast matching struct on reports and leavesNikolay Aleksandrov1-0/+3
We need to be careful and always zero the whole br_ip struct when it is used for matching since the rhashtable change. This patch fixes all the places which didn't properly clear it which in turn might've caused mismatches. Thanks for the great bug report with reproducing steps and bisection. Steps to reproduce (from the bug report): ip link add br0 type bridge mcast_querier 1 ip link set br0 up ip link add v2 type veth peer name v3 ip link set v2 master br0 ip link set v2 up ip link set v3 up ip addr add 3.0.0.2/24 dev v3 ip netns add test ip link add v1 type veth peer name v1 netns test ip link set v1 master br0 ip link set v1 up ip -n test link set v1 up ip -n test addr add 3.0.0.1/24 dev v1 # Multicast receiver ip netns exec test socat UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork - # Multicast sender echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588 Reported-by: liam.mcbirnie@boeing.com Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04libcxgb: fix incorrect ppmax calculationVarun Prakash1-1/+8
BITS_TO_LONGS() uses DIV_ROUND_UP() because of this ppmax value can be greater than available per cpu page pods. This patch removes BITS_TO_LONGS() to fix this issue. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2xChris Leech1-11/+15
Way back in 3c9c36bcedd426f2be2826da43e5163de61735f7 the ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x driver and used by bnx2fc without including the generic software fcoe module. But, FCoE is generally used over an 802.1q VLAN, and the implementation of ndo_fcoe_get_wwn in the 8021q module was not similarly changed. The result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a call to ndo_fcoe_get_wwn through the 8021q interface to the underlying bnx2x interface. The bnx2fc driver then falls back to a potentially different mapping of Ethernet MAC to Fibre Channel WWN, creating an incompatibility with the fabric and target configurations when compared to the WWNs used by pre-boot firmware and differently-configured kernels. So make the conditional inclusion of FCoE code in 8021q match the conditional inclusion in netdevice.h Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_bufferLiu Jian1-1/+5
In function do_write_buffer(), in the for loop, there is a case chip_ready() returns 1 while chip_good() returns 0, so it never break the loop. To fix this, chip_good() is enough and it should timeout if it stay bad for a while. Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") Signed-off-by: Yi Huaijie <yihuaijie@huawei.com> Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Tokunori Ikegami <ikegami_to@yahoo.co.jp> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-04-04dm: disable DISCARD if the underlying storage no longer supports itMike Snitzer3-8/+24
Storage devices which report supporting discard commands like WRITE_SAME_16 with unmap, but reject discard commands sent to the storage device. This is a clear storage firmware bug but it doesn't change the fact that should a program cause discards to be sent to a multipath device layered on this buggy storage, all paths can end up failed at the same time from the discards, causing possible I/O loss. The first discard to a path will fail with Illegal Request, Invalid field in cdb, e.g.: kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current] kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00 kernel: blk_update_request: critical target error, dev sdfn, sector 10487808 The SCSI layer converts this to the BLK_STS_TARGET error number, the sd device disables its support for discard on this path, and because of the BLK_STS_TARGET error multipath fails the discard without failing any path or retrying down a different path. But subsequent discards can cause path failures. Any discards sent to the path which already failed a discard ends up failing with EIO from blk_cloned_rq_check_limits with an "over max size limit" error since the discard limit was set to 0 by the sd driver for the path. As the error is EIO, this now fails the path and multipath tries to send the discard down the next path. This cycle continues as discards are sent until all paths fail. Fix this by training DM core to disable DISCARD if the underlying storage already did so. Also, fix branching in dm_done() and clone_endio() to reflect the mutually exclussive nature of the IO operations in question. Cc: stable@vger.kernel.org Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-04sch_cake: Make sure we can write the IP header before changing DSCP bitsToke Høiland-Jørgensen1-0/+11
There is not actually any guarantee that the IP headers are valid before we access the DSCP bits of the packets. Fix this using the same approach taken in sch_dsmark. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04sch_cake: Use tc_skb_protocol() helper for getting packet protocolToke Høiland-Jørgensen1-1/+1
We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04tcp: Ensure DCTCP reacts to lossesKoen De Schepper1-18/+18
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP". Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values. This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed. The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison. | Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06% We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno. Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> Cc: Bob Briscoe <research@bobbriscoe.net> Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <borkmann@iogearbox.net> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net/sched: act_sample: fix divide by zero in the traffic pathDavide Caratti2-2/+32
the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below: # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1 action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt Add a TDC selftest to document that 'rate' is now being validated. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Yotam Gigi <yotam.gi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stopLorenzo Bianconi1-8/+12
When a bpf program is uploaded, the driver computes the number of xdp tx queues resulting in the allocation of additional qsets. Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")' the driver runs link state polling for each VF resulting in the following NULL pointer dereference: [ 56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 56.178032] Mem abort info: [ 56.180834] ESR = 0x96000005 [ 56.183877] Exception class = DABT (current EL), IL = 32 bits [ 56.189792] SET = 0, FnV = 0 [ 56.192834] EA = 0, S1PTW = 0 [ 56.195963] Data abort info: [ 56.198831] ISV = 0, ISS = 0x00000005 [ 56.202662] CM = 0, WnR = 0 [ 56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0 [ 56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000 [ 56.219094] Internal error: Oops: 96000005 [#1] SMP [ 56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3 [ 56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018 [ 56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.283312] lr : mutex_lock+0x2c/0x50 [ 56.286962] sp : ffff0000219af1b0 [ 56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0 [ 56.295565] x27: 0000000000000000 x26: 0000000000000015 [ 56.300865] x25: 0000000000000000 x24: 0000000000000000 [ 56.306165] x23: 0000000000000000 x22: ffff000011117000 [ 56.311465] x21: ffff800f64dfc080 x20: 0000000000000020 [ 56.316766] x19: 0000000000000020 x18: 0000000000000001 [ 56.322066] x17: 0000000000000000 x16: ffff800f2e077080 [ 56.327367] x15: 0000000000000004 x14: 0000000000000000 [ 56.332667] x13: ffff000010964438 x12: 0000000000000002 [ 56.337967] x11: 0000000000000000 x10: 0000000000000c70 [ 56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50 [ 56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84 [ 56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480 [ 56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080 [ 56.364469] x1 : 0000000000000000 x0 : 0000000000000020 [ 56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a) [ 56.376110] Call trace: [ 56.378546] __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.383414] drain_workqueue+0x34/0x198 [ 56.387247] nicvf_open+0x48/0x9e8 [nicvf] [ 56.391334] nicvf_open+0x898/0x9e8 [nicvf] [ 56.395507] nicvf_xdp+0x1bc/0x238 [nicvf] [ 56.399595] dev_xdp_install+0x68/0x90 [ 56.403333] dev_change_xdp_fd+0xc8/0x240 [ 56.407333] do_setlink+0x8e0/0xbe8 [ 56.410810] __rtnl_newlink+0x5b8/0x6d8 [ 56.414634] rtnl_newlink+0x54/0x80 [ 56.418112] rtnetlink_rcv_msg+0x22c/0x2f8 [ 56.422199] netlink_rcv_skb+0x60/0x120 [ 56.426023] rtnetlink_rcv+0x28/0x38 [ 56.429587] netlink_unicast+0x1c8/0x258 [ 56.433498] netlink_sendmsg+0x1b4/0x350 [ 56.437410] sock_sendmsg+0x4c/0x68 [ 56.440887] ___sys_sendmsg+0x240/0x280 [ 56.444711] __sys_sendmsg+0x68/0xb0 [ 56.448275] __arm64_sys_sendmsg+0x2c/0x38 [ 56.452361] el0_svc_handler+0x9c/0x128 [ 56.456186] el0_svc+0x8/0xc [ 56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10) [ 56.465166] ---[ end trace 4a57fdc27b0a572c ]--- [ 56.469772] Kernel panic - not syncing: Fatal exception Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them") Fixes: 2c632ad8bc74 ("net: thunderx: move link state polling function to VF") Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: Fix sparse: some warnings in HNS driversYonglong Liu11-43/+33
There are some sparse warnings in the HNS drivers: warning: incorrect type in assignment (different address spaces) expected void [noderef] <asn:2> *io_base got void *vaddr warning: cast removes address space '<asn:2>' of expression [...] Add __iomem and change all the u8 __iomem to void __iomem to fix these kind of warnings. warning: incorrect type in argument 1 (different address spaces) expected void [noderef] <asn:2> *base got unsigned char [usertype] *base_addr warning: cast to restricted __le16 warning: incorrect type in assignment (different base types) expected unsigned int [usertype] tbl_tcam_data_high got restricted __le32 [usertype] warning: cast to restricted __le32 [...] These variables used u32/u16 as their type, and finally as a parameter of writel(), writel() will do the cpu_to_le32 coversion so remove the little endian covert code to fix these kind of warnings. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: Fix WARNING when remove HNS driver with SMMU enabledYonglong Liu1-1/+3
When enable SMMU, remove HNS driver will cause a WARNING: [ 141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8 [ 141.954673] Modules linked in: hns_enet_drv(-) [ 141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G W 5.0.0-rc1-28723-gb729c57de95c-dirty #32 [ 141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017 [ 142.000244] pstate: 60000005 (nZCv daif -PAN -UAO) [ 142.009886] pc : __iommu_dma_unmap+0xc0/0xc8 [ 142.018476] lr : __iommu_dma_unmap+0xc0/0xc8 [ 142.027066] sp : ffff000013533b90 [ 142.033728] x29: ffff000013533b90 x28: ffff8013e6983600 [ 142.044420] x27: 0000000000000000 x26: 0000000000000000 [ 142.055113] x25: 0000000056000000 x24: 0000000000000015 [ 142.065806] x23: 0000000000000028 x22: ffff8013e66eee68 [ 142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000 [ 142.087192] x19: 0000000000001000 x18: 0000000000000007 [ 142.097885] x17: 000000000000000e x16: 0000000000000001 [ 142.108578] x15: 0000000000000019 x14: 363139343a70616d [ 142.119270] x13: 6e75656761705f67 x12: 0000000000000000 [ 142.129963] x11: 00000000ffffffff x10: 0000000000000006 [ 142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0 [ 142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8 [ 142.162042] x5 : 0000000000000000 x4 : 0000000000000000 [ 142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500 [ 142.183427] x1 : 0000000000000000 x0 : 0000000000000035 [ 142.194120] Call trace: [ 142.199030] __iommu_dma_unmap+0xc0/0xc8 [ 142.206920] iommu_dma_unmap_page+0x20/0x28 [ 142.215335] __iommu_unmap_page+0x40/0x60 [ 142.223399] hnae_unmap_buffer+0x110/0x134 [ 142.231639] hnae_free_desc+0x6c/0x10c [ 142.239177] hnae_fini_ring+0x14/0x34 [ 142.246540] hnae_fini_queue+0x2c/0x40 [ 142.254080] hnae_put_handle+0x38/0xcc [ 142.261619] hns_nic_dev_remove+0x54/0xfc [hns_enet_drv] [ 142.272312] platform_drv_remove+0x24/0x64 [ 142.280552] device_release_driver_internal+0x17c/0x20c [ 142.291070] driver_detach+0x4c/0x90 [ 142.298259] bus_remove_driver+0x5c/0xd8 [ 142.306148] driver_unregister+0x2c/0x54 [ 142.314037] platform_driver_unregister+0x10/0x18 [ 142.323505] hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv] [ 142.335248] __arm64_sys_delete_module+0x214/0x25c [ 142.344891] el0_svc_common+0xb0/0x10c [ 142.352430] el0_svc_handler+0x24/0x80 [ 142.359968] el0_svc+0x8/0x7c0 [ 142.366104] ---[ end trace 60ad1cd58e63c407 ]--- The tx ring buffer map when xmit and unmap when xmit done. So in hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring() have a unmap operation for tx ring buffer, which is already unmapped when xmit done, than cause this WARNING. The hnae_alloc_buffers() is called in hnae_init_ring(), so the hnae_free_buffers() should be in hnae_fini_ring(), not in hnae_free_desc(). In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring(). When the ring buffer is tx ring, adds a piece of code to ensure that the tx ring is unmap. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: fix ICMP6 neighbor solicitation messages discard problemYonglong Liu1-6/+27
ICMP6 neighbor solicitation messages will be discard by the Hip06 chips, because of not setting forwarding pool. Enable promisc mode has the same problem. This patch fix the wrong forwarding table configs for the multicast vague matching when enable promisc mode, and add forwarding pool for the forwarding table. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: Fix probabilistic memory overwrite when HNS driver initializedYonglong Liu1-1/+1
When reboot the system again and again, may cause a memory overwrite. [ 15.638922] systemd[1]: Reached target Swap. [ 15.667561] tun: Universal TUN/TAP device driver, 1.6 [ 15.676756] Bridge firewalling registered [ 17.344135] Unable to handle kernel paging request at virtual address 0000000200000040 [ 17.352179] Mem abort info: [ 17.355007] ESR = 0x96000004 [ 17.358105] Exception class = DABT (current EL), IL = 32 bits [ 17.364112] SET = 0, FnV = 0 [ 17.367209] EA = 0, S1PTW = 0 [ 17.370393] Data abort info: [ 17.373315] ISV = 0, ISS = 0x00000004 [ 17.377206] CM = 0, WnR = 0 [ 17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____) [ 17.386926] [0000000200000040] pgd=0000000000000000 [ 17.391878] Internal error: Oops: 96000004 [#1] SMP [ 17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G E 4.19.25-1.2.78.aarch64 #1 [ 17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018 [ 17.425615] Workqueue: events_unbound async_run_entry_fn [ 17.435151] pstate: 00000005 (nzcv daif -PAN -UAO) [ 17.444139] pc : __mutex_lock.isra.1+0x74/0x540 [ 17.453002] lr : __mutex_lock.isra.1+0x3c/0x540 [ 17.461701] sp : ffff000100d9bb60 [ 17.469146] x29: ffff000100d9bb60 x28: 0000000000000000 [ 17.478547] x27: 0000000000000000 x26: ffff802fb8945000 [ 17.488063] x25: 0000000000000000 x24: ffff802fa32081a8 [ 17.497381] x23: 0000000000000002 x22: ffff801fa2b15220 [ 17.506701] x21: ffff000009809000 x20: ffff802fa23a0888 [ 17.515980] x19: ffff801fa2b15220 x18: 0000000000000000 [ 17.525272] x17: 0000000200000000 x16: 0000000200000000 [ 17.534511] x15: 0000000000000000 x14: 0000000000000000 [ 17.543652] x13: ffff000008d95db8 x12: 000000000000000d [ 17.552780] x11: ffff000008d95d90 x10: 0000000000000b00 [ 17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560 [ 17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05 [ 17.579839] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000 [ 17.597734] x1 : 0000000200000000 x0 : 0000000200000000 [ 17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____)) [ 17.617438] Call trace: [ 17.623349] __mutex_lock.isra.1+0x74/0x540 [ 17.630927] __mutex_lock_slowpath+0x24/0x30 [ 17.638602] mutex_lock+0x50/0x60 [ 17.645295] drain_workqueue+0x34/0x198 [ 17.652623] __sas_drain_work+0x7c/0x168 [ 17.659903] sas_drain_work+0x60/0x68 [ 17.666947] hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main] [ 17.676129] do_scsi_scan_host+0x70/0xb0 [ 17.683534] do_scan_async+0x20/0x228 [ 17.690586] async_run_entry_fn+0x4c/0x1d0 [ 17.697997] process_one_work+0x1b4/0x3f8 [ 17.705296] worker_thread+0x54/0x470 Every time the call trace is not the same, but the overwrite address is always the same: Unable to handle kernel paging request at virtual address 0000000200000040 The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG, didn't use the io_base offset. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: Use NAPI_POLL_WEIGHT for hns driverYonglong Liu1-5/+2
When the HNS driver loaded, always have an error print: "netif_napi_add() called with weight 256" This is because the kernel checks the NAPI polling weights requested by drivers and it prints an error message if a driver requests a weight bigger than 64. So use NAPI_POLL_WEIGHT to fix it. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()Liubin Shu1-2/+3
This patch is trying to fix the issue due to: [27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv] After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be interrupted by interruptions, and than call hns_nic_tx_poll_one() to handle the new packets, and free the skb. So, when turn back to hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free. This patch update tx ring statistics in hns_nic_tx_poll_one() to fix the bug. Signed-off-by: Liubin Shu <shuliubin@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04arm64: fix wrong check of on_sdei_stack in nmi contextWei Li1-0/+6
When doing unwind_frame() in the context of pseudo nmi (need enable CONFIG_ARM64_PSEUDO_NMI), reaching the bottom of the stack (fp == 0, pc != 0), function on_sdei_stack() will return true while the sdei acpi table is not inited in fact. This will cause a "NULL pointer dereference" oops when going on. Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-04-04blk-mq: do not reset plug->rq_count before the list is sortedDongli Zhang1-1/+2
We would never be able to sort the list if we first reset plug->rq_count which is used in conditional check later. Fixes: ce5b009cff19 ("block: improve logic around when to sort a plug list") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-04csky: Fix syscall_get_arguments() and syscall_set_arguments()Dmitry V. Levin1-4/+6
C-SKY syscall arguments are located in orig_a0,a1,a2,a3,regs[0],regs[1] fields of struct pt_regs. Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading orig_a0,regs[1..5] fields instead. Likewise, syscall_set_arguments() was writing orig_a0,regs[1..5] fields instead. Link: http://lkml.kernel.org/r/20190329171230.GB32456@altlinux.org Fixes: 4859bfca11c7d ("csky: System Call") Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org # v4.20+ Tested-by: Guo Ren <ren_guo@c-sky.com> Acked-by: Guo Ren <ren_guo@c-sky.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04riscv: Fix syscall_get_arguments() and syscall_set_arguments()Dmitry V. Levin1-5/+7
RISC-V syscall arguments are located in orig_a0,a1..a5 fields of struct pt_regs. Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading s3..s7 fields instead of a1..a5. Likewise, syscall_set_arguments() was writing s3..s7 fields instead of a1..a5. Link: http://lkml.kernel.org/r/20190329171221.GA32456@altlinux.org Fixes: e2c0cdfba7f69 ("RISC-V: User-facing API") Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org # v4.15+ Acked-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04flow_dissector: rst'ify documentationStanislav Fomichev3-115/+127
Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix formatting. Also, link it from the Documentation/networking/index.rst. Tested with 'make htmldocs' to make sure it looks reasonable. Fixes: ae82899bbe92 ("flow_dissector: document BPF flow dissector environment") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()Steven Rostedt (Red Hat)1-3/+6
The only users that calls syscall_get_arguments() with a variable and not a hard coded '6' is ftrace_syscall_enter(). syscall_get_arguments() can be optimized by removing a variable input, and always grabbing 6 arguments regardless of what the system call actually uses. Change ftrace_syscall_enter() to pass the 6 args into a local stack array and copy the necessary arguments into the trace event as needed. This is needed to remove two parameters from syscall_get_arguments(). Link: http://lkml.kernel.org/r/20161107213233.627583542@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04ptrace: Remove maxargs from task_current_syscall()Steven Rostedt (Red Hat)3-43/+42
task_current_syscall() has a single user that passes in 6 for maxargs, which is the maximum arguments that can be used to get system calls from syscall_get_arguments(). Instead of passing in a number of arguments to grab, just get 6 arguments. The args argument even specifies that it's an array of 6 items. This will also allow changing syscall_get_arguments() to not get a variable number of arguments, but always grab 6. Linus also suggested not passing in a bunch of arguments to task_current_syscall() but to instead pass in a pointer to a structure, and just fill the structure. struct seccomp_data has almost all the parameters that is needed except for the stack pointer (sp). As seccomp_data is part of uapi, and I'm afraid to change it, a new structure was created "syscall_info", which includes seccomp_data and adds the "sp" field. Link: http://lkml.kernel.org/r/20161107213233.466776454@goodmis.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04mm/compaction.c: abort search if isolation failsQian Cai1-1/+1
Running LTP oom01 in a tight loop or memory stress testing put the system in a low-memory situation could triggers random memory corruption like page flag corruption below due to in fast_isolate_freepages(), if isolation fails, next_search_order() does not abort the search immediately could lead to improper accesses. UBSAN: Undefined behaviour in ./include/linux/mm.h:1195:50 index 7 is out of range for type 'zone [5]' Call Trace: dump_stack+0x62/0x9a ubsan_epilogue+0xd/0x7f __ubsan_handle_out_of_bounds+0x14d/0x192 __isolate_free_page+0x52c/0x600 compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:3124! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI RIP: 0010:__isolate_free_page+0x464/0x600 RSP: 0000:ffff888b9e1af848 EFLAGS: 00010007 RAX: 0000000030000000 RBX: ffff888c39fcf0f8 RCX: 0000000000000000 RDX: 1ffff111873f9e25 RSI: 0000000000000004 RDI: ffffed1173c35ef6 RBP: ffff888b9e1af898 R08: fffffbfff4fc2461 R09: fffffbfff4fc2460 R10: fffffbfff4fc2460 R11: ffffffffa7e12303 R12: 0000000000000008 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff888ba8e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc7abc00000 CR3: 0000000752416004 CR4: 00000000001606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 Link: http://lkml.kernel.org/r/20190320192648.52499-1-cai@lca.pw Fixes: dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
2019-04-04mm/compaction.c: correct zone boundary handling when resetting pageblock skip hintsMel Gorman1-10/+17
Mikhail Gavrilo reported the following bug being triggered in a Fedora kernel based on 5.1-rc1 but it is relevant to a vanilla kernel. kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) kernel: ------------[ cut here ]------------ kernel: kernel BUG at include/linux/mm.h:1021! kernel: invalid opcode: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G C 5.1.0-0.rc1.git1.3.fc31.x86_64 #1 kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018 kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0 kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 <0f> 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01 kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246 kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20 kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000 kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000 kernel: FS: 0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0 kernel: Call Trace: kernel: __reset_isolation_suitable+0x62/0x120 kernel: reset_isolation_suitable+0x3b/0x40 kernel: kswapd+0x147/0x540 kernel: ? finish_wait+0x90/0x90 kernel: kthread+0x108/0x140 kernel: ? balance_pgdat+0x560/0x560 kernel: ? kthread_park+0x90/0x90 kernel: ret_from_fork+0x27/0x50 He bisected it down to e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints"). The problem is that the patch in question was sloppy with respect to the handling of zone boundaries. In some instances, it was possible for PFNs outside of a zone to be examined and if those were not properly initialised or poisoned then it would trigger the VM_BUG_ON. This patch corrects the zone boundary issues when resetting pageblock skip hints and Mikhail reported that the bug did not trigger after 30 hours of testing. Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Qian Cai <cai@lca.pw> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
2019-04-04arm/mach-at91/pm : fix possible object reference leakPeng Hao1-2/+4
of_find_device_by_node() takes a reference to the struct device when it finds a match via get_device. When returning error we should call put_device. Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
2019-04-03ipv6: Fix dangling pointer when ipv6 fragmentJunwei Hu1-1/+3
At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func. Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func. Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03net-gro: Fix GRO flush when receiving a GSO packet.Steffen Klassert1-1/+1
Currently we may merge incorrectly a received GSO packet or a packet with frag_list into a packet sitting in the gro_hash list. skb_segment() may crash case because the assumptions on the skb layout are not met. The correct behaviour would be to flush the packet in the gro_hash list and send the received GSO packet directly afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders") sets NAPI_GRO_CB(skb)->flush in this case, but this is not checked before merging. This patch makes sure to check this flag and to not merge in that case. Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03scsi: lpfc: Fix missing wakeups on abort threadsJames Smart1-4/+3
Abort thread wakeups, on some wqe types, are not happening. The thread wakeup logic is dependent upon the LPFC_DRIVER_ABORTED flag. However, on these wqes, the completion handler running prior to the io completion routine ends up clearing the flag. Rework the wakeup logic to look at a non-null waitq element which must be set if the abort thread is waiting. This is reverting the change in the indicated patch. Fixes: c2017260eea2d ("scsi: lpfc: Rework locking on SCSI io completion") Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-03scsi: storvsc: Reduce default ring buffer size to 128 KbytesMichael Kelley1-1/+1
Reduce the default VMbus channel ring buffer size for storvsc SCSI devices from 1 Mbyte to 128 Kbytes. Measurements show that ring buffer sizes above 128 Kbytes do not increase performance even at very high IOPS rates, so don't waste the memory. Also remove the dependence on PAGE_SIZE, since the ring buffer size should not change on architectures where PAGE_SIZE is not 4 Kbytes. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-03scsi: storvsc: Fix calculation of sub-channel countMichael Kelley1-2/+11
When the number of sub-channels offered by Hyper-V is >= the number of CPUs in the VM, calculate the correct number of sub-channels. The current code produces one too many. This scenario arises only when the number of CPUs is artificially restricted (for example, with maxcpus=<n> on the kernel boot line), because Hyper-V normally offers a sub-channel count < number of CPUs. While the current code doesn't break, the extra sub-channel is unbalanced across the CPUs (for example, a total of 5 channels on a VM with 4 CPUs). Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-03scsi: core: add new RDAC LENOVO/DE_Series deviceXose Vazquez Perez2-0/+2
Blacklist "Universal Xport" LUN. It's used for in-band storage array management. Also add model to the rdac dh family. Cc: Martin Wilck <mwilck@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: NetApp RDAC team <ng-eseries-upstream-maintainers@netapp.com> Cc: Christophe Varoqui <christophe.varoqui@opensvc.com> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: SCSI ML <linux-scsi@vger.kernel.org> Cc: DM ML <dm-devel@redhat.com> Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-04drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplugChris Wilson1-1/+1
drivers/gpu/drm/i915/gvt/display.c:457: warning: Function parameter or member 'connected' not described in 'intel_vgpu_emulate_hotplug' drivers/gpu/drm/i915/gvt/display.c:457: warning: Excess function parameter 'conncted' description in 'intel_vgpu_emulate_hotplug' Fixes: 1ca20f33df42 ("drm/i915/gvt: add hotplug emulation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-04-04drm/i915/gvt: Correct the calculation of plane sizeXiong Zhang1-6/+2
stride isn't in unit of pixel, it is bytes, so calculation of plane size doesn't need to multiple bpp. Fixes: e546e281d33d ("drm/i915/gvt: Dmabuf support for GVT-g") Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-04-03vfio/type1: Limit DMA mappings per containerAlex Williamson1-0/+14
Memory backed DMA mappings are accounted against a user's locked memory limit, including multiple mappings of the same memory. This accounting bounds the number of such mappings that a user can create. However, DMA mappings that are not backed by memory, such as DMA mappings of device MMIO via mmaps, do not make use of page pinning and therefore do not count against the user's locked memory limit. These mappings still consume memory, but the memory is not well associated to the process for the purpose of oom killing a task. To add bounding on this use case, we introduce a limit to the total number of concurrent DMA mappings that a user is allowed to create. This limit is exposed as a tunable module option where the default value of 64K is expected to be well in excess of any reasonable use case (a large virtual machine configuration would typically only make use of tens of concurrent mappings). This fixes CVE-2019-3882. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-03vfio/spapr_tce: Make symbol 'tce_iommu_driver_ops' staticWang Hai1-1/+1
Fixes the following sparse warning: drivers/vfio/vfio_iommu_spapr_tce.c:1401:36: warning: symbol 'tce_iommu_driver_ops' was not declared. Should it be static? Fixes: 5ffd229c0273 ("powerpc/vfio: Implement IOMMU driver for VFIO") Signed-off-by: Wang Hai <wanghai26@huawei.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-03vfio/pci: use correct format charactersLouis Taylor1-2/+2
When compiling with -Wformat, clang emits the following warnings: drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Louis Taylor <louis@kragniz.eu> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-03paride/pf: Fix potential NULL pointer dereferenceYueHaibing1-1/+11
Syzkaller report this: pf: pf version 1.04, major 47, cluster 64, nice 0 pf: No ATAPI disk detected kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 9887 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:pf_init+0x7af/0x1000 [pf] Code: 46 77 d2 48 89 d8 48 c1 e8 03 80 3c 28 00 74 08 48 89 df e8 03 25 a6 d2 4c 8b 23 49 8d bc 24 80 05 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 74 05 e8 e6 24 a6 d2 49 8b bc 24 80 05 00 00 e8 79 34 RSP: 0018:ffff8881abcbf998 EFLAGS: 00010202 RAX: 00000000000000b0 RBX: ffffffffc1e4a8a8 RCX: ffffffffaec50788 RDX: 0000000000039b10 RSI: ffffc9000153c000 RDI: 0000000000000580 RBP: dffffc0000000000 R08: ffffed103ee44e59 R09: ffffed103ee44e59 R10: 0000000000000001 R11: ffffed103ee44e58 R12: 0000000000000000 R13: ffffffffc1e4b028 R14: 0000000000000000 R15: 0000000000000020 FS: 00007f1b78a91700(0000) GS:ffff8881f7200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6d72b207f8 CR3: 00000001d5790004 CR4: 00000000007606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? 0xffffffffc1e50000 do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f1b78a90c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f1b78a90c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1b78a916bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Modules linked in: pf(+) paride gpio_tps65218 tps65218 i2c_cht_wc ati_remote dc395x act_meta_skbtcindex act_ife ife ecdh_generic rc_xbox_dvd sky81452_regulator v4l2_fwnode leds_blinkm snd_usb_hiface comedi(C) aes_ti slhc cfi_cmdset_0020 mtd cfi_util sx8654 mdio_gpio of_mdio fixed_phy mdio_bitbang libphy alcor_pci matrix_keymap hid_uclogic usbhid scsi_transport_fc videobuf2_v4l2 videobuf2_dma_sg snd_soc_pcm179x_spi snd_soc_pcm179x_codec i2c_demux_pinctrl mdev snd_indigodj isl6405 mii enc28j60 cmac adt7316_i2c(C) adt7316(C) fmc_trivial fmc nf_reject_ipv4 authenc rc_dtt200u rtc_ds1672 dvb_usb_dibusb_mc dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb dvb_core videobuf2_common videobuf2_vmalloc videobuf2_memops regulator_haptic adf7242 mac802154 ieee802154 s5h1409 da9034_ts snd_intel8x0m wmi cx24120 usbcore sdhci_cadence sdhci_pltfm sdhci mmc_core joydev i2c_algo_bit scsi_transport_iscsi iscsi_boot_sysfs ves1820 lockd grace nfs_acl auth_rpcgss sunrp c ip_vs snd_soc_adau7002 snd_cs4281 snd_rawmidi gameport snd_opl3_lib snd_seq_device snd_hwdep snd_ac97_codec ad7418 hid_primax hid snd_soc_cs4265 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore ti_adc108s102 eeprom_93cx6 i2c_algo_pca mlxreg_hotplug st_pressure st_sensors industrialio_triggered_buffer kfifo_buf industrialio v4l2_common videodev media snd_soc_adau_utils rc_pinnacle_grey rc_core pps_gpio leds_lm3692x nandcore ledtrig_pattern iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic aes_x86_64 piix crypto_simd input_leds psmouse cryp td glue_helper ide_core intel_agp serio_raw intel_gtt agpgart ata_generic i2c_piix4 pata_acpi parport_pc parport rtc_cmos floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: paride] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 7a818cf5f210d79e ]--- If alloc_disk fails in pf_init_units, pf->disk will be NULL, however in pf_detect and pf_exit, it's not check this before free.It may result a NULL pointer dereference. Also when register_blkdev failed, blk_cleanup_queue() and blk_mq_free_tag_set() should be called to free resources. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 6ce59025f118 ("paride/pf: cleanup queues when detection fails") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-03io_uring: fix double free in case of fileset regitration failureJens Axboe1-0/+1
Will Deacon reported the following KASAN complaint: [ 149.890370] ================================================================== [ 149.891266] BUG: KASAN: double-free or invalid-free in io_sqe_files_unregister+0xa8/0x140 [ 149.892218] [ 149.892411] CPU: 113 PID: 3974 Comm: io_uring_regist Tainted: G B 5.1.0-rc3-00012-g40b114779944 #3 [ 149.893623] Hardware name: linux,dummy-virt (DT) [ 149.894169] Call trace: [ 149.894539] dump_backtrace+0x0/0x228 [ 149.895172] show_stack+0x14/0x20 [ 149.895747] dump_stack+0xe8/0x124 [ 149.896335] print_address_description+0x60/0x258 [ 149.897148] kasan_report_invalid_free+0x78/0xb8 [ 149.897936] __kasan_slab_free+0x1fc/0x228 [ 149.898641] kasan_slab_free+0x10/0x18 [ 149.899283] kfree+0x70/0x1f8 [ 149.899798] io_sqe_files_unregister+0xa8/0x140 [ 149.900574] io_ring_ctx_wait_and_kill+0x190/0x3c0 [ 149.901402] io_uring_release+0x2c/0x48 [ 149.902068] __fput+0x18c/0x510 [ 149.902612] ____fput+0xc/0x18 [ 149.903146] task_work_run+0xf0/0x148 [ 149.903778] do_notify_resume+0x554/0x748 [ 149.904467] work_pending+0x8/0x10 [ 149.905060] [ 149.905331] Allocated by task 3974: [ 149.905934] __kasan_kmalloc.isra.0.part.1+0x48/0xf8 [ 149.906786] __kasan_kmalloc.isra.0+0xb8/0xd8 [ 149.907531] kasan_kmalloc+0xc/0x18 [ 149.908134] __kmalloc+0x168/0x248 [ 149.908724] __arm64_sys_io_uring_register+0x2b8/0x15a8 [ 149.909622] el0_svc_common+0x100/0x258 [ 149.910281] el0_svc_handler+0x48/0xc0 [ 149.910928] el0_svc+0x8/0xc [ 149.911425] [ 149.911696] Freed by task 3974: [ 149.912242] __kasan_slab_free+0x114/0x228 [ 149.912955] kasan_slab_free+0x10/0x18 [ 149.913602] kfree+0x70/0x1f8 [ 149.914118] __arm64_sys_io_uring_register+0xc2c/0x15a8 [ 149.915009] el0_svc_common+0x100/0x258 [ 149.915670] el0_svc_handler+0x48/0xc0 [ 149.916317] el0_svc+0x8/0xc [ 149.916817] [ 149.917101] The buggy address belongs to the object at ffff8004ce07ed00 [ 149.917101] which belongs to the cache kmalloc-128 of size 128 [ 149.919197] The buggy address is located 0 bytes inside of [ 149.919197] 128-byte region [ffff8004ce07ed00, ffff8004ce07ed80) [ 149.921142] The buggy address belongs to the page: [ 149.921953] page:ffff7e0013381f00 count:1 mapcount:0 mapping:ffff800503417c00 index:0x0 compound_mapcount: 0 [ 149.923595] flags: 0x1ffff00000010200(slab|head) [ 149.924388] raw: 1ffff00000010200 dead000000000100 dead000000000200 ffff800503417c00 [ 149.925706] raw: 0000000000000000 0000000080400040 00000001ffffffff 0000000000000000 [ 149.927011] page dumped because: kasan: bad access detected [ 149.927956] [ 149.928224] Memory state around the buggy address: [ 149.929054] ffff8004ce07ec00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 149.930274] ffff8004ce07ec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 149.931494] >ffff8004ce07ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 149.932712] ^ [ 149.933281] ffff8004ce07ed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 149.934508] ffff8004ce07ee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 149.935725] ================================================================== which is due to a failure in registrering a fileset. This frees the ctx->user_files pointer, but doesn't clear it. When the io_uring instance is later freed through the normal channels, we free this pointer again. At this point it's invalid. Ensure we clear the pointer when we free it for the error case. Reported-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>