linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-02-02	selftests: net: Add FIN_ACK processing order related latency spike test	SeongJae Park	4	-0/+189
	This commit adds a test for FIN_ACK process races related reconnection latency spike issues. The issue has described and solved by the previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is received"). The test program is configured with a server and a client process. The server creates and binds a socket to a port that dynamically allocated, listen on it, and start a infinite loop. Inside the loop, it accepts connection, reads 4 bytes from the socket, and closes the connection. The client is constructed as an infinite loop. Inside the loop, it creates a socket with LINGER and NODELAY option, connect to the server, send 4 bytes data, try read some data from server. After the read() returns, it measure the latency from the beginning of this loop to this point and if the latency is larger than 1 second (spike), print a message. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-02	tcp: Reduce SYN resend delay if a suspicous ACK is received	SeongJae Park	1	-1/+7
	When closing a connection, the two acks that required to change closing socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in reverse order. This is possible in RSS disabled environments such as a connection inside a host. For example, expected state transitions and required packets for the disconnection will be similar to below flow. 00 (Process A) (Process B) 01 ESTABLISHED ESTABLISHED 02 close() 03 FIN_WAIT_1 04 ---FIN--> 05 CLOSE_WAIT 06 <--ACK--- 07 FIN_WAIT_2 08 <--FIN/ACK--- 09 TIME_WAIT 10 ---ACK--> 11 LAST_ACK 12 CLOSED CLOSED In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be substituted to RST and RST/ACK, but there is no difference in the main logic. The acks in lines 6 and 8 are the acks. If the line 8 packet is processed before the line 6 packet, it will be just ignored as it is not a expected packet, and the later process of the line 6 packet will change the status of Process A to FIN_WAIT_2, but as it has already handled line 8 packet, it will not go to TIME_WAIT and thus will not send the line 10 packet to Process B. Thus, Process B will left in CLOSE_WAIT status, as below. 00 (Process A) (Process B) 01 ESTABLISHED ESTABLISHED 02 close() 03 FIN_WAIT_1 04 ---FIN--> 05 CLOSE_WAIT 06 (<--ACK---) 07 (<--FIN/ACK---) 08 (fired in right order) 09 <--FIN/ACK--- 10 <--ACK--- 11 (processed in reverse order) 12 FIN_WAIT_2 Later, if the Process B sends SYN to Process A for reconnection using the same port, Process A will responds with an ACK for the last flow, which has no increased sequence number. Thus, Process A will send RST, wait for TIMEOUT_INIT (one second in default), and then try reconnection. If reconnections are frequent, the one second latency spikes can be a big problem. Below is a tcpdump results of the problem: 14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644 14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512 14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298 /* ONE SECOND DELAY */ 15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644 This commit mitigates the problem by reducing the delay for the next SYN if the suspicous ACK is received while in SYN_SENT state. Following commit will add a selftest, which can be also helpful for understanding of this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-02	MAINTAINERS: correct entries for ISDN/mISDN section	Lukas Bulwahn	1	-2/+4
	Commit 6d97985072dc ("isdn: move capi drivers to staging") cleaned up the isdn drivers and split the MAINTAINERS section for ISDN, but missed to add the terminal slash for the two directories mISDN and hardware. Hence, all files in those directories were not part of the new ISDN/mISDN SUBSYSTEM, but were considered to be part of "THE REST". Rectify the situation, and while at it, also complete the section with two further build files that belong to that subsystem. This was identified with a small script that finds all files belonging to "THE REST" according to the current MAINTAINERS file, and I investigated upon its output. Fixes: 6d97985072dc ("isdn: move capi drivers to staging") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-01	cls_rsvp: fix rsvp_policy	Eric Dumazet	1	-4/+2
	NLA_BINARY can be confusing, since .len value represents the max size of the blob. cls_rsvp really wants user space to provide long enough data for TCA_RSVP_DST and TCA_RSVP_SRC attributes. BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline] BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline] BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 rsvp_get net/sched/cls_rsvp.h:258 [inline] gen_handle net/sched/cls_rsvp.h:402 [inline] rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104 rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b349 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f269d43dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f269d43e6d4 RCX: 000000000045b349 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000009c2 R14: 00000000004cb338 R15: 000000000075bfd4 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 6fa8c0144b77 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-01	MAINTAINERS: Orphan HSR network protocol	Sven Eckelmann	1	-2/+1
	The current maintainer Arvid Brodin <arvid.brodin@alten.se> hasn't contributed to the kernel since 2015-02-27. His company mail address is also bouncing and the company confirmed (2020-01-31) that no Arvid Brodin is working for them: > Vi har dessvärre ingen Arvid Brodin som arbetar på ALTEN. A MIA person cannot be the maintainer. It is better to mark is as orphaned until some other person can jump in and take over the responsibility for HSR. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-01	qed: Fix a error code in qed_hw_init()	Dan Carpenter	1	-0/+1
	If the qed_fw_overlay_mem_alloc() then we should return -ENOMEM instead of success. Fixes: 30d5f85895fa ("qed: FW 8.42.2.0 Add fw overlay feature") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-01	octeontx2-pf: Fix an IS_ERR() vs NULL bug	Dan Carpenter	1	-2/+2
	The otx2_mbox_get_rsp() function never returns NULL, it returns error pointers on error. Fixes: 34bfe0ebedb7 ("octeontx2-pf: MTU, MAC and RX mode config support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	tcp: clear tp->segs_{in\|out} in tcp_disconnect()	Eric Dumazet	1	-0/+2
	tp->segs_in and tp->segs_out need to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: 2efd055c53c0 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	tcp: clear tp->data_segs{in\|out} in tcp_disconnect()	Eric Dumazet	1	-0/+2
	tp->data_segs_in and tp->data_segs_out need to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	tcp: clear tp->delivered in tcp_disconnect()	Eric Dumazet	1	-0/+1
	tp->delivered needs to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: ddf1af6fa00e ("tcp: new delivery accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	tcp: clear tp->total_retrans in tcp_disconnect()	Eric Dumazet	1	-0/+1
	total_retrans needs to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	netfilter: nf_flowtable: fix documentation	Matteo Croce	1	-1/+1
	In the flowtable documentation there is a missing semicolon, the command as is would give this error: nftables.conf:5:27-33: Error: syntax error, unexpected devices, expecting newline or semicolon hook ingress priority 0 devices = { br0, pppoe-data }; ^^^^^^^ nftables.conf:4:12-13: Error: invalid hook (null) flowtable ft { ^^ Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-31	netfilter: flowtable: Fix setting forgotten NF_FLOW_HW_DEAD flag	Paul Blakey	1	-0/+1
	During the refactor this was accidently removed. Fixes: ae29045018c8 ("netfilter: flowtable: add nf_flow_offload_tuple() helper") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-31	netfilter: flowtable: Fix missing flush hardware on table free	Paul Blakey	1	-0/+1
	If entries exist when freeing a hardware offload enabled table, we queue work for hardware while running the gc iteration. Execute it (flush) after queueing. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-31	netfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup	Paul Blakey	1	-1/+1
	On netdev down event, nf_flow_table_cleanup() is called for the relevant device and it cleans all the tables that are on that device. If one of those tables has hardware offload flag, nf_flow_table_iterate_cleanup flushes hardware and then runs the gc. But the gc can queue more hardware work, which will take time to execute. Instead first add the work, then flush it, to execute it now. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-31	netfilter: Use kvcalloc	Joe Perches	2	-4/+3
	Convert the uses of kvmalloc_array with __GFP_ZERO to the equivalent kvcalloc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-31	mlxsw: spectrum_qdisc: Fix 64-bit division error in mlxsw_sp_qdisc_tbf_rate_kbps	Nathan Chancellor	1	-1/+1
	When building arm32 allmodconfig: ERROR: "__aeabi_uldivmod" [drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko] undefined! rate_bytes_ps has type u64, we need to use a 64-bit division helper to avoid a build error. Fixes: a44f58c41bfb ("mlxsw: spectrum_qdisc: Support offloading of TBF Qdisc") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	ionic: fix rxq comp packet type mask	Shannon Nelson	1	-1/+1
	Be sure to include all the packet type bits in the mask. Fixes: fbfb8031533c ("ionic: Add hardware init and device commands") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	net: phy: at803x: disable vddio regulator	Michael Walle	1	-0/+11
	The probe() might enable a VDDIO regulator, which needs to be disabled again before calling regulator_put(). Add a remove() function. Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	net: mii_timestamper: fix static allocation by PHY driver	Michael Walle	2	-1/+14
	If phydev->mii_ts is set by the PHY driver, it will always be overwritten in of_mdiobus_register_phy(). Fix it. Also make sure, that the unregister() doesn't do anything if the mii_timestamper was provided by the PHY driver. Fixes: 1dca22b18421 ("net: mdio: of: Register discovered MII time stampers.") Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-31	net: mdio: of: fix potential NULL pointer derefernce	Michael Walle	1	-3/+6
	of_find_mii_timestamper() returns NULL if no timestamper is found. Therefore, guard the unregister_mii_timestamper() calls. Fixes: 1dca22b18421 ("net: mdio: of: Register discovered MII time stampers.") Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-30	rxrpc: Fix missing active use pinning of rxrpc_local object	David Howells	5	-40/+62
	The introduction of a split between the reference count on rxrpc_local objects and the usage count didn't quite go far enough. A number of kernel work items need to make use of the socket to perform transmission. These also need to get an active count on the local object to prevent the socket from being closed. Fix this by getting the active count in those places. Also split out the raw active count get/put functions as these places tend to hold refs on the rxrpc_local object already, so getting and putting an extra object ref is just a waste of time. The problem can lead to symptoms like: BUG: kernel NULL pointer dereference, address: 0000000000000018 .. CPU: 2 PID: 818 Comm: kworker/u9:0 Not tainted 5.5.0-fscache+ #51 ... RIP: 0010:selinux_socket_sendmsg+0x5/0x13 ... Call Trace: security_socket_sendmsg+0x2c/0x3e sock_sendmsg+0x1a/0x46 rxrpc_send_keepalive+0x131/0x1ae rxrpc_peer_keepalive_worker+0x219/0x34b process_one_work+0x18e/0x271 worker_thread+0x1a3/0x247 kthread+0xe6/0xeb ret_from_fork+0x1f/0x30 Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-30	rxrpc: Fix insufficient receive notification generation	David Howells	1	-4/+2
	In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence number of the packet is immediately following the hard-ack point at the end of the function. However, this isn't sufficient, since the recvmsg side may have been advancing the window and then overrun the position in which we're adding - at which point rx_hard_ack >= seq0 and no notification is generated. Fix this by always generating a notification at the end of the input function. Without this, a long call may stall, possibly indefinitely. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-30	rxrpc: Fix use-after-free in rxrpc_put_local()	David Howells	1	-1/+4
	Fix rxrpc_put_local() to not access local->debug_id after calling atomic_dec_return() as, unless that returned n==0, we no longer have the right to access the object. Fixes: 06d9532fa6b3 ("rxrpc: Fix read-after-free in rxrpc_queue_local()") Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-30	net/core: Do not clear VF index for node/port GUIDs query	Leon Romanovsky	1	-2/+2
	VF numbers were assigned to node_guid and port_guid, but cleared right before such query calls were issued. It caused to return node/port GUIDs of VF index 0 for all VFs. Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs") Reported-by: Adrian Chiris <adrianc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	y2038: sparc: remove use of struct timex	Arnd Bergmann	2	-16/+19
	'struct timex' is one of the last users of 'struct timeval' and is only referenced in one place in the kernel any more, to convert the user space timex into the kernel-internal version on sparc64, with a different tv_usec member type. As a preparation for hiding the time_t definition and everything using that in the kernel, change the implementation once more to only convert the timeval member, and then enclose the struct definition in an #ifdef. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	sparc64: add support for folded p4d page tables	Mike Rapoport	7	-32/+84
	Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	ide: make drive->dn read only	Dan Carpenter	2	-1/+5
	The IDE core always sets ->dn correctly so changing it is never required. Setting it to a different value than assigned by IDE core is very likely to result in data corruption (due to wrong transfer timings being set on the controller etc.) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Tested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6	Geert Uytterhoeven	3	-12/+9
	If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined! This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6 selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin. As exporting a symbol for an empty function would be a bit wasteful, fix this by providing a dummy version of mptcp_handle_ipv6_mapped() for the CONFIG_MPTCP_IPV6=n case. Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it clear this is a pure-IPV6 function, just like mptcpv6_init(). Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	net: drop_monitor: Use kstrdup	Joe Perches	1	-6/+2
	Convert the equivalent but rather odd uses of kmemdup with __GFP_ZERO to the more common kstrdup and avoid unnecessary zeroing of copied over memory. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	udp: document udp_rcv_segment special case for looped packets	Willem de Bruijn	1	-0/+7
	Commit 6cd021a58c18a ("udp: segment looped gso packets correctly") fixes an issue with rare udp gso multicast packets looped onto the receive path. The stable backport makes the narrowest change to target only these packets, when needed. As opposed to, say, expanding __udp_gso_segment, which is harder to reason to be free from unintended side-effects. But the resulting code is hardly self-describing. Document its purpose and rationale. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	mptcp: MPTCP_HMAC_TEST should depend on MPTCP	Geert Uytterhoeven	1	-2/+4
	As the MPTCP HMAC test is integrated into the MPTCP code, it can be built only when MPTCP is enabled. Hence when MPTCP is disabled, asking the user if the test code should be enabled is futile. Wrap the whole block of MPTCP-specific config options inside a check for MPTCP. While at it, drop the "default n" for MPTCP_HMAC_TEST, as that is the default anyway. Fixes: 65492c5a6ab5df50 ("mptcp: move from sha1 (v0) to sha256 (v1)") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30	mptcp: Fix incorrect IPV6 dependency check	Geert Uytterhoeven	1	-1/+1
	If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: net/mptcp/protocol.o: In function `__mptcp_tcp_fallback': protocol.c:(.text+0x786): undefined reference to `inet6_stream_ops' Fix this by checking for CONFIG_MPTCP_IPV6 instead of CONFIG_IPV6, like is done in all other places in the mptcp code. Fixes: 8ab183deb26a3b79 ("mptcp: cope with later TCP fallback") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	io_uring: add support for epoll_ctl(2)	Jens Axboe	2	-0/+72
	This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29	eventpoll: support non-blocking do_epoll_ctl() calls	Jens Axboe	2	-13/+42
	Also make it available outside of epoll, along with the helper that decides if we need to copy the passed in epoll_event. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29	eventpoll: abstract out epoll_ctl() handler	Jens Axboe	1	-20/+25
	No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29	io_uring: fix linked command file table usage	Jens Axboe	3	-14/+21
	We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29	Revert "MAINTAINERS: mptcp@ mailing list is moderated"	Mat Martineau	1	-1/+1
	This reverts commit 74759e1693311a8d1441de836c4080c192374238. mptcp@lists.01.org accepts messages from non-subscribers. There was an invisible and unexpected server-wide rule limiting the number of recipients for subscribers and non-subscribers alike, and that has now been turned off for this list. Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	netfilter: ipset: fix suspicious RCU usage in find_set_and_id	Kadlecsik József	1	-20/+21
	find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held. However, in the error path there can be a follow-up recvmsg() without the mutex held. Use the start() function of struct netlink_dump_control instead of dump() to verify and report if the specified set does not exist. Thanks to Pablo Neira Ayuso for helping me to understand the subleties of the netlink protocol. Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-29	mptcp: handle tcp fallback when using syn cookies	Florian Westphal	5	-3/+14
	We can't deal with syncookie mode yet, the syncookie rx path will create tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one: TCP: SYN flooding on port 20002. Sending cookies. BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120 subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209 cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline] [..] Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2". Note that MPTCP should work with syncookies (4th ack would carry needed state), but it appears better to sort that out in -next so do tcp fallback for now. I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair. Cc: Eric Dumazet <edumazet@google.com> Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	mptcp: avoid a lockdep splat when mcast group was joined	Florian Westphal	1	-2/+6
	syzbot triggered following lockdep splat: ffffffff82d2cd40 (rtnl_mutex){+.+.}, at: ip_mc_drop_socket+0x52/0x180 but task is already holding lock: ffff8881187a2310 (sk_lock-AF_INET){+.+.}, at: mptcp_close+0x18/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_acquire+0xee/0x230 lock_sock_nested+0x89/0xc0 do_ip_setsockopt.isra.0+0x335/0x22f0 ip_setsockopt+0x35/0x60 tcp_setsockopt+0x5d/0x90 __sys_setsockopt+0xf3/0x190 __x64_sys_setsockopt+0x61/0x70 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prevs_add+0x2b7/0x1210 __lock_acquire+0x10b6/0x1400 lock_acquire+0xee/0x230 __mutex_lock+0x120/0xc70 ip_mc_drop_socket+0x52/0x180 inet_release+0x36/0xe0 __sock_release+0xfd/0x130 __mptcp_close+0xa8/0x1f0 inet_release+0x7f/0xe0 __sock_release+0x69/0x130 sock_close+0x18/0x20 __fput+0x179/0x400 task_work_run+0xd5/0x110 do_exit+0x685/0x1510 do_group_exit+0x7e/0x170 __x64_sys_exit_group+0x28/0x30 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe The trigger is: socket(AF_INET, SOCK_STREAM, 0x106 /* IPPROTO_MPTCP */) = 4 setsockopt(4, SOL_IP, MCAST_JOIN_GROUP, {gr_interface=7, gr_group={sa_family=AF_INET, sin_port=htons(20003), sin_addr=inet_addr("224.0.0.2")}}, 136) = 0 exit(0) Which results in a call to rtnl_lock while we are holding the parent mptcp socket lock via mptcp_close -> lock_sock(msk) -> inet_release -> ip_mc_drop_socket -> rtnl_lock(). >From lockdep point of view we thus have both 'rtnl_lock; lock_sock' and 'lock_sock; rtnl_lock'. Fix this by stealing the msk conn_list and doing the subflow close without holding the msk lock. Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	mptcp: fix panic on user pointer access	Florian Westphal	1	-18/+22
	Its not possible to call the kernel_(s\|g)etsockopt functions here, the address points to user memory: General protection fault in user access. Non-canonical address? WARNING: CPU: 1 PID: 5352 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0xba/0xe0 arch/x86/mm/extable.c:77 Kernel panic - not syncing: panic_on_warn set ... [..] Call Trace: fixup_exception+0x9d/0xcd arch/x86/mm/extable.c:178 general_protection+0x2d/0x40 arch/x86/entry/entry_64.S:1202 do_ip_getsockopt+0x1f6/0x1860 net/ipv4/ip_sockglue.c:1323 ip_getsockopt+0x87/0x1c0 net/ipv4/ip_sockglue.c:1561 tcp_getsockopt net/ipv4/tcp.c:3691 [inline] tcp_getsockopt+0x8c/0xd0 net/ipv4/tcp.c:3685 kernel_getsockopt+0x121/0x1f0 net/socket.c:3736 mptcp_getsockopt+0x69/0x90 net/mptcp/protocol.c:830 __sys_getsockopt+0x13a/0x220 net/socket.c:2175 We can call tcp_get/setsockopt functions instead. Doing so fixes crashing, but still leaves rtnl related lockdep splat: WARNING: possible circular locking dependency detected 5.5.0-rc6 #2 Not tainted ------------------------------------------------------ syz-executor.0/16334 is trying to acquire lock: ffffffff84f7a080 (rtnl_mutex){+.+.}, at: do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 but task is already holding lock: ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: lock_sock include/net/sock.h:1516 [inline] ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: mptcp_setsockopt+0x28/0x90 net/mptcp/protocol.c:1284 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_sock_nested+0xca/0x120 net/core/sock.c:2944 lock_sock include/net/sock.h:1516 [inline] do_ip_setsockopt.isra.0+0x281/0x3820 net/ipv4/ip_sockglue.c:645 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 udp_setsockopt+0x5d/0xa0 net/ipv4/udp.c:2639 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prev_add kernel/locking/lockdep.c:2475 [inline] check_prevs_add kernel/locking/lockdep.c:2580 [inline] validate_chain kernel/locking/lockdep.c:2970 [inline] __lock_acquire+0x1fb2/0x4680 kernel/locking/lockdep.c:3954 lock_acquire+0x127/0x330 kernel/locking/lockdep.c:4484 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x158/0x1340 kernel/locking/mutex.c:1103 do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 tcp_setsockopt net/ipv4/tcp.c:3159 [inline] tcp_setsockopt+0x8c/0xd0 net/ipv4/tcp.c:3153 kernel_setsockopt+0x121/0x1f0 net/socket.c:3767 mptcp_setsockopt+0x69/0x90 net/mptcp/protocol.c:1288 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(rtnl_mutex); lock(sk_lock-AF_INET); lock(rtnl_mutex); The lockdep complaint is because we hold mptcp socket lock when calling the sk_prot get/setsockopt handler, and those might need to acquire the rtnl mutex. Normally, order is: rtnl_lock(sk) -> lock_sock Whereas for mptcp the order is lock_sock(mptcp_sk) rtnl_lock -> lock_sock(subflow_sk) We can avoid this by releasing the mptcp socket lock early, but, as Paolo points out, we need to get/put the subflow socket refcount before doing so to avoid race with concurrent close(). Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	mptcp: defer freeing of cached ext until last moment	Florian Westphal	1	-2/+4
	access to msk->cached_ext is only legal if the msk is locked or all concurrent accesses are impossible. Furthermore, once we start to tear down, we must make sure nothing else can step in and allocate a new cached ext. So place this code in the destroy callback where it belongs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	net: mvneta: fix XDP support if sw bm is used as fallback	Lorenzo Bianconi	1	-3/+7
	In order to fix XDP support if sw buffer management is used as fallback for hw bm devices, define MVNETA_SKB_HEADROOM as maximum between XDP_PACKET_HEADROOM and NET_SKB_PAD and let the hw aligns the IP header to 4-byte boundary. Fix rx_offset_correction initialization if mvneta_bm_port_init fails in mvneta_resume routine Fixes: 0db51da7a8e9 ("net: mvneta: add basic XDP support") Tested-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	sch_choke: Use kvcalloc	Joe Perches	1	-1/+1
	Convert the use of kvmalloc_array with __GFP_ZERO to the equivalent kvcalloc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	mptcp: Fix build with PROC_FS disabled.	David S. Miller	1	-0/+2
	net/mptcp/subflow.c: In function ‘mptcp_subflow_create_socket’: net/mptcp/subflow.c:624:25: error: ‘struct netns_core’ has no member named ‘sock_inuse’ Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	MAINTAINERS: mptcp@ mailing list is moderated	Randy Dunlap	1	-1/+1
	Note that mptcp@lists.01.org is moderated, like we note for other mailing lists. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Cc: netdev@vger.kernel.org Cc: mptcp@lists.01.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29	drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing	Ben Skeggs	2	-12/+32
	Without relaxing this requirement, TU10x boards will fail to load without an updated linux-firmware, and TU11x will completely fail to load because FW isn't available yet. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-29	drm/nouveau/acr: return error when registering LSF if ACR not supported	Ben Skeggs	1	-1/+5
	This fixes an oops on TU11x GPUs where SEC2 attempts to register its falcon, and triggers a NULL-pointer deref because ACR isn't yet supported. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-29	drm/nouveau/disp/gv100-: not all channel types support reporting error codes	Ben Skeggs	1	-6/+17
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>