linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-03-15	net sched actions: return explicit error when tunnel_key mode is not specified	Roman Mashak	1	-0/+1
	If set/unset mode of the tunnel_key action is not provided, ->init() still returns 0, and the caller proceeds with bogus 'struct tc_action *' object, this results in crash: % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1 [ 35.805515] general protection fault: 0000 [#1] SMP PTI [ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd serio_raw [ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286 [ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190 [ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206 [ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000 [ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910 [ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808 [ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38 [ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910 [ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000) knlGS:0000000000000000 [ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0 [ 35.816457] Call Trace: [ 35.817158] tc_ctl_action+0x11a/0x220 [ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0 [ 35.818457] ? __slab_alloc+0x1c/0x30 [ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0 [ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0 [ 35.820231] netlink_rcv_skb+0xce/0x100 [ 35.820744] netlink_unicast+0x164/0x220 [ 35.821500] netlink_sendmsg+0x293/0x370 [ 35.822040] sock_sendmsg+0x30/0x40 [ 35.822508] ___sys_sendmsg+0x2c5/0x2e0 [ 35.823149] ? pagecache_get_page+0x27/0x220 [ 35.823714] ? filemap_fault+0xa2/0x640 [ 35.824423] ? page_add_file_rmap+0x108/0x200 [ 35.825065] ? alloc_set_pte+0x2aa/0x530 [ 35.825585] ? finish_fault+0x4e/0x70 [ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0 [ 35.826723] ? __sys_sendmsg+0x41/0x70 [ 35.827230] __sys_sendmsg+0x41/0x70 [ 35.827710] do_syscall_64+0x68/0x120 [ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 35.828859] RIP: 0033:0x7f3d0ca4da67 [ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67 [ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003 [ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000 [ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640 [ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2 fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48 8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c [ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0 [ 35.838291] ---[ end trace a095c06ee4b97a26 ]--- Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-15	net/smc: simplify wait when closing listen socket	Ursula Braun	2	-26/+3
	Closing of a listen socket wakes up kernel_accept() of smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker() gives up the internal clcsock. The wait logic introduced with commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") might wait longer than necessary. This patch implements the idea to implement the wait just with flush_work(), and gets rid of the extra smc_close_wait_listen_clcsock() function. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Reported-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	sunvnet: does not support GSO for sctp	Cathy Zhou	1	-1/+1
	The NETIF_F_GSO_SOFTWARE implies support for GSO on SCTP, but the sunvnet driver does not support GSO for sctp. Here we remove the NETIF_F_GSO_SOFTWARE feature flag and only report NETIF_F_ALL_TSO instead. Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	tg3: prevent scheduling while atomic splat	Jonathan Toppins	1	-1/+1
	The problem was introduced in commit 506b0a395f26 ("[netdrv] tg3: APE heartbeat changes"). The bug occurs because tp->lock spinlock is held which is obtained in tg3_start by way of tg3_full_lock(), line 11571. The documentation for usleep_range() specifically states it cannot be used inside a spinlock. Fixes: 506b0a395f26 ("[netdrv] tg3: APE heartbeat changes") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu	Sabrina Dubroca	5	-10/+32
	Prior to the rework of PMTU information storage in commit 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."), when a PMTU event advertising a PMTU smaller than net.ipv4.route.min_pmtu was received, we would disable setting the DF flag on packets by locking the MTU metric, and set the PMTU to net.ipv4.route.min_pmtu. Since then, we don't disable DF, and set PMTU to net.ipv4.route.min_pmtu, so the intermediate router that has this link with a small MTU will have to drop the packets. This patch reestablishes pre-2.6.39 behavior by splitting rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu. rt_mtu_locked indicates that we shouldn't set the DF bit on that path, and is checked in ip_dont_fragment(). One possible workaround is to set net.ipv4.route.min_pmtu to a value low enough to accommodate the lowest MTU encountered. Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	dpaa_eth: remove duplicate increment of the tx_errors counter	Camelia Groza	1	-1/+0
	The tx_errors counter is incremented by the dpaa_xmit caller. Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	dpaa_eth: increment the RX dropped counter when needed	Camelia Groza	1	-1/+3
	Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	dpaa_eth: remove duplicate initialization	Camelia Groza	1	-1/+0
	The fd_format has already been initialized at this point. Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	dpaa_eth: fix error in dpaa_remove()	Madalin Bucur	1	-1/+1
	The recent changes that make the driver probing compatible with DSA were not propagated in the dpa_remove() function, breaking the module unload function. Using the proper device to address the issue. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	soc/fsl/qbman: fix issue in qman_delete_cgr_safe()	Madalin Bucur	1	-23/+5
	The wait_for_completion() call in qman_delete_cgr_safe() was triggering a scheduling while atomic bug, replacing the kthread with a smp_call_function_single() call to fix it. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	net: use skb_to_full_sk() in skb_update_prio()	Eric Dumazet	2	-9/+17
	Andrei Vagin reported a KASAN: slab-out-of-bounds error in skb_update_prio() Since SYNACK might be attached to a request socket, we need to get back to the listener socket. Since this listener is manipulated without locks, add const qualifiers to sock_cgroup_prioidx() so that the const can also be used in skb_update_prio() Also add the const qualifier to sock_cgroup_classid() for consistency. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14	can: cc770: Fix queue stall & dropped RTR reply	Andri Yngvason	2	-28/+68
	While waiting for the TX object to send an RTR, an external message with a matching id can overwrite the TX data. In this case we must call the rx routine and then try transmitting the message that was overwritten again. The queue was being stalled because the RX event did not generate an interrupt to wake up the queue again and the TX event did not happen because the TXRQST flag is reset by the chip when new data is received. According to the CC770 datasheet the id of a message object should not be changed while the MSGVAL bit is set. This has been fixed by resetting the MSGVAL bit before modifying the object in the transmit function and setting it after. It is not enough to set & reset CPUUPD. It is important to keep the MSGVAL bit reset while the message object is being modified. Otherwise, during RTR transmission, a frame with matching id could trigger an rx-interrupt, which would cause a race condition between the interrupt routine and the transmit function. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> Tested-by: Richard Weinberger <richard@nod.at> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-14	can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack	Andri Yngvason	1	-15/+0
	This has been reported to cause stalls on rt-linux. Suggested-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-13	qed: Use after free in qed_rdma_free()	Dan Carpenter	1	-1/+1
	We're dereferencing "p_hwfn->p_rdma_info" but that is freed on the line before in qed_rdma_resc_free(p_hwfn). Fixes: 9de506a547c0 ("qed: Free RoCE ILT Memory on rmmod qedr") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13	net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()	Greg Hackmann	1	-1/+1
	f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a __this_cpu_read() call inside ipcomp_alloc_tfms(). At the time, __this_cpu_read() required the caller to either not care about races or to handle preemption/interrupt issues. 3.15 tightened the rules around some per-cpu operations, and now __this_cpu_read() should never be used in a preemptible context. On 3.15 and later, we need to use this_cpu_read() instead. syzkaller reported this leading to the following kernel BUG while fuzzing sendmsg: BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101 caller is ipcomp_init_state+0x185/0x990 CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0xb9/0x115 check_preemption_disabled+0x1cb/0x1f0 ipcomp_init_state+0x185/0x990 ? __xfrm_init_state+0x876/0xc20 ? lock_downgrade+0x5e0/0x5e0 ipcomp4_init_state+0xaa/0x7c0 __xfrm_init_state+0x3eb/0xc20 xfrm_init_state+0x19/0x60 pfkey_add+0x20df/0x36f0 ? pfkey_broadcast+0x3dd/0x600 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_seq_stop+0x80/0x80 ? __skb_clone+0x236/0x750 ? kmem_cache_alloc+0x1f6/0x260 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_process+0x62a/0x6f0 pfkey_process+0x62a/0x6f0 ? pfkey_send_new_mapping+0x11c0/0x11c0 ? mutex_lock_io_nested+0x1390/0x1390 pfkey_sendmsg+0x383/0x750 ? dump_sp+0x430/0x430 sock_sendmsg+0xc0/0x100 ___sys_sendmsg+0x6c8/0x8b0 ? copy_msghdr_from_user+0x3b0/0x3b0 ? pagevec_lru_move_fn+0x144/0x1f0 ? find_held_lock+0x32/0x1c0 ? do_huge_pmd_anonymous_page+0xc43/0x11e0 ? lock_downgrade+0x5e0/0x5e0 ? get_kernel_page+0xb0/0xb0 ? _raw_spin_unlock+0x29/0x40 ? do_huge_pmd_anonymous_page+0x400/0x11e0 ? __handle_mm_fault+0x553/0x2460 ? __fget_light+0x163/0x1f0 ? __sys_sendmsg+0xc7/0x170 __sys_sendmsg+0xc7/0x170 ? SyS_shutdown+0x1a0/0x1a0 ? __do_page_fault+0x5a0/0xca0 ? lock_downgrade+0x5e0/0x5e0 SyS_sendmsg+0x27/0x40 ? __sys_sendmsg+0x170/0x170 do_syscall_64+0x19f/0x640 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f0ee73dfb79 RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79 RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004 RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0 R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440 R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-12	net: dsa: Fix dsa_is_user_port() test inversion	Florian Fainelli	1	-1/+1
	During the conversion to dsa_is_user_port(), a condition ended up being reversed, which would prevent the creation of any user port when using the legacy binding and/or platform data, fix that. Fixes: 4a5b85ffe2a0 ("net: dsa: use dsa_is_user_port everywhere") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	l2tp: fix races with ipv4-mapped ipv6 addresses	Paolo Abeni	2	-23/+18
	The l2tp_tunnel_create() function checks for v4mapped ipv6 sockets and cache that flag, so that l2tp core code can reusing it at xmit time. If the socket is provided by the userspace, the connection status of the tunnel sockets can change between the tunnel creation and the xmit call, so that syzbot is able to trigger the following splat: BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192 [inline] BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264 Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448 CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 ip6_dst_idev include/net/ip6_fib.h:192 [inline] ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline] l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148 pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046 __sys_sendmsg+0xe5/0x210 net/socket.c:2080 SYSC_sendmsg net/socket.c:2091 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2087 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x453e69 RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69 RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000 This change addresses the issues: * explicitly checking for TCP_ESTABLISHED for user space provided sockets * dropping the v4mapped flag usage - it can become outdated - and explicitly invoking ipv6_addr_v4mapped() instead The issue is apparently there since ancient times. v1 -> v2: (many thanks to Guillaume) - with csum issue introduced in v1 - replace pr_err with pr_debug - fix build issue with IPV6 disabled - move l2tp_sk_is_v4mapped in l2tp_core.c v2 -> v3: - don't update inet_daddr for v4mapped address, unneeded - drop rendundant check at creation time Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	net: ipv6: keep sk status consistent after datagram connect failure	Paolo Abeni	1	-7/+14
	On unsuccesful ip6_datagram_connect(), if the failure is caused by ip6_datagram_dst_update(), the sk peer information are cleared, but the sk->sk_state is preserved. If the socket was already in an established status, the overall sk status is inconsistent and fouls later checks in datagram code. Fix this saving the old peer information and restoring them in case of failure. This also aligns ipv6 datagram connect() behavior with ipv4. v1 -> v2: - added missing Fixes tag Fixes: 85cb73ff9b74 ("net: ipv6: reset daddr and dport in sk if connect() fails") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	e1000e: Fix link check race condition	Benjamin Poirier	2	-21/+24
	Alex reported the following race condition: /* link goes up... interrupt... schedule watchdog / \ e1000_watchdog_task \ e1000e_has_link \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link \ e1000e_phy_has_link_generic(..., &link) link = true / link goes down... interrupt / \ e1000_msix_other hw->mac.get_link_status = true / link is up / mac->get_link_status = false link_active = true / link_active is true, wrongly, and stays so because * get_link_status is false */ Avoid this problem by making sure that we don't set get_link_status = false after having checked the link. It seems this problem has been present since the introduction of e1000e. Link: https://lkml.org/lkml/2018/1/29/338 Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12	Revert "e1000e: Separate signaling for link check/link up"	Benjamin Poirier	3	-19/+9
	This reverts commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013. This reverts commit 4110e02eb45ea447ec6f5459c9934de0a273fb91. This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98. Commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up") changed what happens to the link status when there is an error which happens after "get_link_status = false" in the copper check_for_link callbacks. Previously, such an error would be ignored and the link considered up. After that commit, any error implies that the link is down. Revert commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up") and its followups. After reverting, the race condition described in the log of commit 19110cfbb34d is reintroduced. It may still be triggered by LSC events but this should keep the link down in case the link is electrically unstable, as discussed. The race may no longer be triggered by RXO events because commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") restored reading icr in the Other handler. Link: https://lkml.org/lkml/2018/3/1/789 Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12	sock_diag: request _diag module only when the family or proto has been registered	Xin Long	6	-10/+33
	Now when using 'ss' in iproute, kernel would try to load all _diag modules, which also causes corresponding family and proto modules to be loaded as well due to module dependencies. Like after running 'ss', sctp, dccp, af_packet (if it works as a module) would be loaded. For example: $ lsmod\|grep sctp $ ss $ lsmod\|grep sctp sctp_diag 16384 0 sctp 323584 5 sctp_diag inet_diag 24576 4 raw_diag,tcp_diag,sctp_diag,udp_diag libcrc32c 16384 3 nf_conntrack,nf_nat,sctp As these family and proto modules are loaded unintentionally, it could cause some problems, like: - Some debug tools use 'ss' to collect the socket info, which loads all those diag and family and protocol modules. It's noisy for identifying issues. - Users usually expect to drop sctp init packet silently when they have no sense of sctp protocol instead of sending abort back. - It wastes resources (especially with multiple netns), and SCTP module can't be unloaded once it's loaded. ... In short, it's really inappropriate to have these family and proto modules loaded unexpectedly when just doing debugging with inet_diag. This patch is to introduce sock_load_diag_module() where it loads the _diag module only when it's corresponding family or proto has been already registered. Note that we can't just load _diag module without the family or proto loaded, as some symbols used in _diag module are from the family or proto module. v1->v2: - move inet proto check to inet_diag to avoid a compiling err. v2->v3: - define sock_load_diag_module in sock.c and export one symbol only. - improve the changelog. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Phil Sutter <phil@nwl.cc> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().	Michael Chan	1	-0/+3
	During initialization, if we encounter errors, there is a code path that calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a warning in firmware logs. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: close & open NIC, only when the interface is in running state.	Venkat Duvvuru	1	-5/+12
	bnxt_restore_pf_fw_resources routine frees PF resources by calling close_nic and allocates the resources back, by doing open_nic. However, this is not needed, if the PF is already in closed state. This bug causes the driver to call open the device and call request_irq() when it is not needed. Ultimately, pci_disable_msix() will crash when bnxt_en is unloaded. This patch fixes the problem by skipping __bnxt_close_nic and __bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the interface is not running. Fixes: 80fcaf46c092 ("bnxt_en: Restore MSIX after disabling SRIOV.") Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Return standard Linux error codes for hwrm flow cmds.	Venkat Duvvuru	1	-3/+20
	Currently, internal error value is returned by the driver, when hwrm_cfa_flow_alloc() fails due lack of resources. We should be returning Linux errno value -ENOSPC instead. This patch also converts other similar command errors to standard Linux errno code (-EIO) in bnxt_tc.c Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Fix regressions when setting up MQPRIO TX rings.	Michael Chan	1	-2/+5
	Recent changes added the bnxt_init_int_mode() call in the driver's open path whenever ring reservations are changed. This call was previously only called in the probe path. In the open path, if MQPRIO TC has been setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO per TC rings. Fix it by not re-initilizing bp->tx_nr_rings_per_tc in bnxt_init_int_mode(). Instead, initialize it in the probe path only after the bnxt_init_int_mode() call. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Pass complete VLAN TCI to the stack.	Michael Chan	2	-2/+3
	When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the stack when calling __vlan_hwaccel_put_tag(). The current code is only passing the 12-bit tag and it is missing the priority bits. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Remove unwanted ovs-offload messages in some conditions	Sriharsha Basavapatna	1	-8/+2
	In some conditions when the driver fails to add a flow in HW and returns an error back to the stack, the stack continues to invoke get_flow_stats() and/or del_flow() on it. The driver fails these APIs with an error message "no flow_node for cookie". The message gets logged repeatedly as long as the stack keeps invoking these functions. Fix this by removing the corresponding netdev_info() calls from these functions. Fixes: d7bc73053024 ("bnxt_en: add code to query TC flower offload stats") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Fix vnic accounting in the bnxt_check_rings() path.	Eddie Wai	1	-44/+20
	The number of vnics to check must be determined ahead of time because only standard RX rings require vnics to support RFS. The logic is similar to the ring reservation logic and we can now use the refactored common functions to do most of the work in setting up the firmware message. Fixes: 8f23d638b36b ("bnxt_en: Expand bnxt_check_rings() to check all resources.") Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	bnxt_en: Refactor the functions to reserve hardware rings.	Michael Chan	1	-32/+53
	The bnxt_hwrm_reserve_{pf\|vf}_rings() functions are very similar to the bnxt_hwrm_check_{pf\|vf}_rings() functions. Refactor the former so that the latter can make use of common code in the next patch. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	net: phy: Tell caller result of phy_change()	Brad Mouring	2	-74/+72
	In 664fcf123a30e (net: phy: Threaded interrupts allow some simplification) the phy_interrupt system was changed to use a traditional threaded interrupt scheme instead of a workqueue approach. With this change, the phy status check moved into phy_change, which did not report back to the caller whether or not the interrupt was handled. This means that, in the case of a shared phy interrupt, only the first phydev's interrupt registers are checked (since phy_interrupt() would always return IRQ_HANDLED). This leads to interrupt storms when it is a secondary device that's actually the interrupt source. Signed-off-by: Brad Mouring <brad.mouring@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12	can: m_can: select pinctrl state in each suspend/resume function	Bich HEMON	1	-0/+5
	Make sure to apply the correct pin state in suspend/resume callbacks. Putting pins in sleep state saves power. Signed-off-by: Bich Hemon <bich.hemon@st.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-12	can: peak/pcie_fd: remove useless code when interface starts	Stephane Grosjean	1	-11/+2
	When an interface starts, the echo_skb array is empty and the network queue should be started only. This patch replaces useless code and locks when the internal RX_BARRIER message is received from the IP core, telling the driver that tx may start. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-12	can: peak/pcie_fd: fix echo_skb is occupied! bug	Stephane Grosjean	2	-8/+12
	This patch makes atomic the handling of the linux-can echo_skb array and the network tx queue. This prevents from the "BUG! echo_skb is occupied!" message to be printed by the linux-can core, in SMP environments. Reported-by: Diana Burgess <diana@peloton-tech.com> Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-12	can: ifi: Repair the error handling	Marek Vasut	1	-27/+37
	The new version of the IFI CANFD core has significantly less complex error state indication logic. In particular, the warning/error state bits are no longer all over the place, but are all present in the STATUS register. Moreover, there is a new IRQ register bit indicating transition between error states (active/warning/passive/busoff). This patch makes use of this bit to weed out the obscure selective INTERRUPT register clearing, which was used to carry over the error state indication into the poll function. While at it, this patch fixes the handling of the ACTIVE state, since the hardware provides indication of the core being in ACTIVE state and that in turn fixes the state transition indication toward userspace. Finally, register reads in the poll function are moved to the matching subfunctions since those are also no longer needed in the poll function. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Heiko Schocher <hs@denx.de> Cc: Markus Marb <markus@marb.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-12	can: ifi: Check core revision upon probe	Marek Vasut	1	-1/+10
	Older versions of the core are not compatible with the driver due to various intrusive fixes of the core. Read out the VER register, check the core revision bitfield and verify if the core in use is new enough (rev 2.1 or newer) to work correctly with this driver. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Heiko Schocher <hs@denx.de> Cc: Markus Marb <markus@marb.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-12	can: m_can: change comparison to bitshift when dealing with a mask	Wolfram Sang	1	-1/+1
	Due to a typo, the mask was destroyed by a comparison instead of a bit shift. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-11	openvswitch: meter: fix the incorrect calculation of max delta_t	zhangliping	1	-3/+9
	Max delat_t should be the full_bucket/rate instead of the full_bucket. Also report EINVAL if the rate is zero. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-11	macvlan: filter out unsupported feature flags	Shannon Nelson	1	-1/+1
	Adding a macvlan device on top of a lowerdev that supports the xfrm offloads fails with a new regression: # ip link add link ens1f0 mv0 type macvlan RTNETLINK answers: Operation not permitted Tracing down the failure shows that the macvlan device inherits the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags from the lowerdev, but with no dev->xfrmdev_ops API filled in, it doesn't actually support xfrm. When the request is made to add the new macvlan device, the XFRM listener for NETDEV_REGISTER calls xfrm_api_check() which fails the new registration because dev->xfrmdev_ops is NULL. The macvlan creation succeeds when we filter out the ESP feature flags in macvlan_fix_features(), so let's filter them out like we're already filtering out ~NETIF_F_NETNS_LOCAL. When XFRM support is added in the future, we can add the flags into MACVLAN_FEATURES. This same problem could crop up in the future with any other new feature flags, so let's filter out any flags that aren't defined as supported in macvlan. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-11	netfilter: nf_tables: release flowtable hooks	Pablo Neira Ayuso	1	-0/+1
	Otherwise we leak this array. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11	netfilter: bridge: ebt_among: add more missing match size checks	Florian Westphal	1	-0/+34
	ebt_among is special, it has a dynamic match size and is exempt from the central size checks. commit c4585a2823edf ("bridge: ebt_among: add missing match size checks") added validation for pool size, but missed fact that the macros ebt_among_wh_src/dst can already return out-of-bound result because they do not check value of wh_src/dst_ofs (an offset) vs. the size of the match that userspace gave to us. v2: check that offset has correct alignment. Paolo Abeni points out that we should also check that src/dst wormhash arrays do not overlap, and src + length lines up with start of dst (or vice versa). v3: compact wormhash_sizes_valid() part NB: Fixes tag is intentionally wrong, this bug exists from day one when match was added for 2.6 kernel. Tag is there so stable maintainers will notice this one too. Tested with same rules from the earlier patch. Fixes: c4585a2823edf ("bridge: ebt_among: add missing match size checks") Reported-by: <syzbot+bdabab6f1983a03fc009@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11	netfilter: x_tables: add and use xt_check_proc_name	Florian Westphal	4	-9/+45
	recent and hashlimit both create /proc files, but only check that name is 0 terminated. This can trigger WARN() from procfs when name is "" or "/". Add helper for this and then use it for both. Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11	netfilter: ebtables: fix erroneous reject of last rule	Florian Westphal	1	-1/+5
	The last rule in the blob has next_entry offset that is same as total size. This made "ebtables32 -A OUTPUT -d de:ad:be:ef:01:02" fail on 64 bit kernel. Fixes: b71812168571fa ("netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-09	ip6erspan: make sure enough headroom at xmit.	William Tu	1	-0/+3
	The patch adds skb_cow_header() to ensure enough headroom at ip6erspan_tunnel_xmit before pushing the erspan header to the skb. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	ip6erspan: improve error handling for erspan version number.	William Tu	1	-0/+2
	When users fill in incorrect erspan version number through the struct erspan_metadata uapi, current code skips pushing the erspan header but continue pushing the gre header, which is incorrect. The patch fixes it by returning error. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	ip6gre: add erspan v2 to tunnel lookup	William Tu	1	-1/+2
	The patch adds the erspan v2 proto in ip6gre_tunnel_lookup so the erspan v2 tunnel can be found correctly. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	mlxsw: spectrum: Prevent duplicate mirrors	Petr Machata	2	-4/+27
	The Spectrum ASIC doesn't support mirroring more than once from a single binding point (which is a port-direction pair). Therefore detect that a second binding of a given binding point is attempted. To that end, extend struct mlxsw_sp_span_inspected_port to track whether a given binding point is bound or not. Extend mlxsw_sp_span_entry_port_find() to look for ports based on the full unique key: port number, direction, and boundness. Besides fixing the overt bug where configured mirrors are not offloaded, this also fixes a more subtle bug: mlxsw_sp_span_inspected_port_del() just defers to mlxsw_sp_span_entry_bound_port_find(), and that used to find the first port with the right number (disregarding the type). Thus by adding and removing egress and ingress mirrors in the right order, one could trick the system into believing it has no egress mirrors when in fact it did have some. That then caused that mlxsw_sp_span_port_mtu_update() didn't update mirroring buffer when MTU was changed. Fixes: 763b4b70afcd ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	mlxsw: spectrum: Fix gact_ok offloading	Jiri Pirko	5	-1/+19
	For ok GACT action, TERMINATE binding_cmd should be used in action set passed down to HW. Fixes: b2925957ec1a9 ("mlxsw: spectrum_flower: Offload "ok" termination action") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	vhost_net: examine pointer types during un-producing	Jason Wang	3	-2/+7
	After commit fc72d1d54dd9 ("tuntap: XDP transmission"), we can actually queueing XDP pointers in the pointer ring, so we should examine the pointer type before freeing the pointer. Fixes: fc72d1d54dd9 ("tuntap: XDP transmission") Reported-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	vhost_net: keep private_data and rx_ring synced	Jason Wang	1	-2/+3
	We get pointer ring from the exported sock, this means we should keep rx_ring and vq->private synced during both vq stop and backend set, otherwise we may see stale rx_ring. Fixes: c67df11f6e480 ("vhost_net: try batch dequing from skb array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09	vhost_net: initialize rx_ring in vhost_net_open()	Alexander Potapenko	1	-0/+1
	KMSAN reported a use of uninit memory in vhost_net_buf_unproduce() while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring: ================================================================== BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho et.c:170 CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170 vhost_net_stop_vq drivers/vhost/net.c:974 [inline] vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982 vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015 __fput+0x49f/0xa00 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop arch/x86/entry/common.c:166 [inline] prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196 syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265 do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292 ... origin: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213 kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392 kmalloc_large_node_hook mm/slub.c:1366 [inline] kmalloc_large_node mm/slub.c:3808 [inline] __kmalloc_node+0x100e/0x1290 mm/slub.c:3818 kmalloc_node include/linux/slab.h:554 [inline] kvmalloc_node+0x1a5/0x2e0 mm/util.c:419 kvmalloc include/linux/mm.h:541 [inline] vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921 misc_open+0x7b5/0x8b0 drivers/char/misc.c:154 chrdev_open+0xc28/0xd90 fs/char_dev.c:417 do_dentry_open+0xccb/0x1430 fs/open.c:752 vfs_open+0x272/0x2e0 fs/open.c:866 do_last fs/namei.c:3378 [inline] path_openat+0x49ad/0x6580 fs/namei.c:3519 do_filp_open+0x267/0x640 fs/namei.c:3553 do_sys_open+0x6ad/0x9c0 fs/open.c:1059 SYSC_openat+0xc7/0xe0 fs/open.c:1086 SyS_openat+0x63/0x90 fs/open.c:1080 do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287 ================================================================== Fixes: c67df11f6e480 ("vhost_net: try batch dequing from skb array") Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>