linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-05-18	kselftests: netfilter: fix leftover net/net-next merge conflict	Florian Westphal	1	-51/+26
	In nf-next, I had extended this script to also cover NAT support for the inet family. In nf, I extended it to cover a regression with 'fully-random' masquerade. Make this script work again by resolving the conflicts as needed. Fixes: 8b4483658364f0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-18	mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM	Vadim Pasternak	1	-2/+16
	Prevent reading unsupported slave address from SFP EEPROM by testing Diagnostic Monitoring Type byte in EEPROM. Read only page zero of EEPROM, in case this byte is zero. If some SFP transceiver does not support Digital Optical Monitoring (DOM), reading SFP EEPROM slave address 0x51 could return an error. Availability of DOM support is verified by reading from zero page Diagnostic Monitoring Type byte describing how diagnostic monitoring is implemented by transceiver. If bit 6 of this byte is set, it indicates that digital diagnostic monitoring has been implemented. Otherwise it is not and transceiver could fail to reply to transaction for slave address 0x51 [1010001X (A2h)], which is used to access measurements page. Such issue has been observed when reading cable MCP2M00-xxxx, MCP7F00-xxxx, and few others. Fixes: 2ea109039cd3 ("mlxsw: spectrum: Add support for access cable info via ethtool") Fixes: 4400081b631a ("mlxsw: spectrum: Fix EEPROM access in case of SFP/SFP+") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-18	mlxsw: core: Prevent QSFP module initialization for old hardware	Vadim Pasternak	4	-0/+17
	Old Mellanox silicons, like switchx-2, switch-ib do not support reading QSFP modules temperature through MTMP register. Attempt to access this register on systems equipped with the this kind of silicon will cause initialization flow failure. Test for hardware resource capability is added in order to distinct between old and new silicon - old silicons do not have such capability. Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones") Fixes: 5c42eaa07bd0 ("mlxsw: core: Extend hwmon interface with QSFP module temperature attributes") Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-18	vsock/virtio: Initialize core virtio vsock before registering the driver	Jorge E. Moreira	1	-7/+6
	Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are accessed (while handling interrupts) before they are initialized. [ 4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8 [ 4.207829] IP: vsock_addr_equals_addr+0x3/0x20 [ 4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0 [ 4.211379] Oops: 0000 [#1] PREEMPT SMP PTI [ 4.211379] Modules linked in: [ 4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1 [ 4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 4.211379] Workqueue: virtio_vsock virtio_transport_rx_work [ 4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000 [ 4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20 [ 4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286 [ 4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0 [ 4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0 [ 4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001 [ 4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0 [ 4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0 [ 4.211379] FS: 0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000 [ 4.211379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0 [ 4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.211379] Call Trace: [ 4.211379] ? vsock_find_connected_socket+0x6c/0xe0 [ 4.211379] virtio_transport_recv_pkt+0x15f/0x740 [ 4.211379] ? detach_buf+0x1b5/0x210 [ 4.211379] virtio_transport_rx_work+0xb7/0x140 [ 4.211379] process_one_work+0x1ef/0x480 [ 4.211379] worker_thread+0x312/0x460 [ 4.211379] kthread+0x132/0x140 [ 4.211379] ? process_one_work+0x480/0x480 [ 4.211379] ? kthread_destroy_worker+0xd0/0xd0 [ 4.211379] ret_from_fork+0x35/0x40 [ 4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e [ 4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28 [ 4.211379] CR2: ffffffffffffffe8 [ 4.211379] ---[ end trace f31cc4a2e6df3689 ]--- [ 4.211379] Kernel panic - not syncing: Fatal exception in interrupt [ 4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 4.211379] Rebooting in 5 seconds.. Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug") Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org [4.9+] Signed-off-by: Jorge E. Moreira <jemoreira@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	net/mlx5e: Fix possible modify header actions memory leak	Eli Britstein	1	-1/+4
	The cited commit could disable the modify header flag, but did not free the allocated memory for the modify header actions. Fix it. Fixes: 27c11b6b844cd ("net/mlx5e: Do not rewrite fields with the same match") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Fix no rewrite fields with the same match	Eli Britstein	1	-6/+16
	With commit 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the same match") there are no rewrites if the rewrite value is the same as the matched value. However, if the field is not matched, the rewrite is also wrongly skipped. Fix it. Fixes: 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the same match") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Additional check for flow destination comparison	Dmytro Linkin	1	-0/+2
	Flow destination comparison has an inaccuracy: code see no difference between same vf ports, which belong to different pfs. Example: If start ping from VF0 (PF1) to VF1 (PF1) and mirror all traffic to VF0 (PF2), icmp reply to VF0 (PF1) and mirrored flow to VF0 (PF2) would be determined as same destination. It lead to creating flow handler with rule nodes, which not added to node tree. When later driver try to delete this flow rules we got kernel crash. Add comparison of vhca_id field to avoid this. Fixes: 1228e912c934 ("net/mlx5: Consider encapsulation properties when comparing destinations") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Add missing ethtool driver info for representors	Dmytro Linkin	1	-1/+18
	For all representors added firmware version info to show in ethtool driver info. For uplink representor, because only it is tied to the pci device sysfs, added pci bus info. Fixes: ff9b85de5d5d ("net/mlx5e: Add some ethtool port control entries to the uplink rep netdev") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Fix number of vports for ingress ACL configuration	Eli Britstein	1	-4/+5
	With the cited commit, ACLs are configured for the VF ports. The loop for the number of ports had the wrong number. Fix it. Fixes: 184867373d8c ("net/mlx5e: ACLs for priority tag mode") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled	Saeed Mahameed	1	-1/+17
	ethtool user spaces needs to know ring count via ETHTOOL_GRXRINGS when executing (ethtool -x) which is retrieved via ethtool get_rxnfc callback, in mlx5 this callback is disabled when CONFIG_MLX5_EN_RXNFC=n. This patch allows only ETHTOOL_GRXRINGS command on mlx5e_get_rxnfc() when CONFIG_MLX5_EN_RXNFC is disabled, so ethtool -x will continue working. Fixes: fe6d86b3c316 ("net/mlx5e: Add CONFIG_MLX5_EN_RXNFC for ethtool rx nfc") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5e: Fix wrong xmit_more application	Tariq Toukan	3	-6/+8
	Cited patch refactored the xmit_more indication while not preserving its functionality. Fix it. Fixes: 3c31ff22b25f ("drivers: mellanox: use netdev_xmit_more() helper") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5: Fix peer pf disable hca command	Bodong Wang	1	-1/+1
	The command was mistakenly using enable_hca in embedded CPU field. Fixes: 22e939a91dcb (net/mlx5: Update enable HCA dependency) Signed-off-by: Bodong Wang <bodong@mellanox.com> Reported-by: Alex Rosenbaum <alexr@mellanox.com> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index	Parav Pandit	6	-41/+43
	To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5: Add meaningful return codes to status_to_err function	Valentine Fatiev	1	-1/+21
	Current version of function status_to_err return -1 for any status returned by mlx5_cmd_invoke function. In case status is MLX5_DRIVER_STATUS_ABORTED we should return 0 to the caller as we assume command completed successfully on FW. If error returned we are getting confusing messages in dmesg. In addition, currently returned value -1 is confusing with -EPERM. New implementation actually fix original commit and return meaningful codes for commands delivery status and print message in case of failure. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	net/mlx5: Imply MLXFW in mlx5_core	Saeed Mahameed	1	-0/+1
	mlxfw can be compiled as external module while mlx5_core can be builtin, in such case mlx5 will act like mlxfw is disabled. Since mlxfw is just a service library for mlx* drivers, imply it in mlx5_core to make it always reachable if it was enabled. Fixes: 3ffaabecd1a1 ("net/mlx5e: Support the flash device ethtool callback") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17	Revert "tipc: fix modprobe tipc failed after switch order of device registration"	David S. Miller	1	-7/+7
	This reverts commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e. More revisions coming up. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	vsock/virtio: free packets during the socket release	Stefano Garzarella	1	-0/+7
	When the socket is released, we should free all packets queued in the per-socket list in order to avoid a memory leak. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	tipc: fix modprobe tipc failed after switch order of device registration	Junwei Hu	1	-7/+7
	Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) is called by tipc_topsrv_create_listener() in the initialization process of tipc_net_ops, tipc_socket_init() must be execute before that. I move tipc_socket_init() into function tipc_init_net(). Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reviewed-by: Kang Zhou <zhoukang7@huawei.com> Reviewed-by: Suanming Mou <mousuanming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	lib: Correct comment of prandom_seed	Philippe Mazenauer	1	-2/+2
	Variable 'entropy' was wrongly documented as 'seed', changed comment to reflect actual variable name. ../lib/random32.c:179: warning: Function parameter or member 'entropy' not described in 'prandom_seed' ../lib/random32.c:179: warning: Excess function parameter 'seed' description in 'prandom_seed' Signed-off-by: Philippe Mazenauer <philippe.mazenauer@outlook.de> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	net: caif: fix the value of size argument of snprintf	swkhack	5	-6/+5
	Because the function snprintf write at most size bytes(including the null byte).So the value of the argument size need not to minus one. Signed-off-by: swkhack <swkhack@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17	bpftool: fix BTF raw dump of FWD's fwd_kind	Andrii Nakryiko	1	-2/+2
	kflag bit determines whether FWD is for struct or union. Use that bit. Fixes: c93cc69004df ("bpftool: add ability to dump BTF types") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-17	selftests/bpf: fix bpf_get_current_task	Alexei Starovoitov	1	-1/+1
	Fix bpf_get_current_task() declaration. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16	ipv6: fix src addr routing with the exception table	Wei Wang	1	-24/+27
	When inserting route cache into the exception table, the key is generated with both src_addr and dest_addr with src addr routing. However, current logic always assumes the src_addr used to generate the key is a /128 host address. This is not true in the following scenarios: 1. When the route is a gateway route or does not have next hop. (rt6_is_gw_or_nonexthop() == false) 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. This means, when looking for a route cache in the exception table, we have to do the lookup twice: first time with the passed in /128 host address, second time with the src_addr stored in fib6_info. This solves the pmtu discovery issue reported by Mikael Magnusson where a route cache with a lower mtu info is created for a gateway route with src addr. However, the lookup code is not able to find this route cache. Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> Bisected-by: David Ahern <dsahern@gmail.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	selftests: pmtu.sh: Remove quotes around commands in setup_xfrm	David Ahern	1	-9/+9
	The first command in setup_xfrm is failing resulting in the test getting skipped: + ip netns exec ns-B ip -6 xfrm state add src fd00:1::a dst fd00:1::b spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel + out=RTNETLINK answers: Function not implemented ... xfrm6 not supported TEST: vti6: PMTU exceptions [SKIP] xfrm4 not supported TEST: vti4: PMTU exceptions [SKIP] ... The setup command started failing when the run_cmd option was added. Removing the quotes fixes the problem: ... TEST: vti6: PMTU exceptions [ OK ] TEST: vti4: PMTU exceptions [ OK ] ... Fixes: 56490b623aa0 ("selftests: Add debugging options to pmtu.sh") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net: avoid weird emergency message	Eric Dumazet	1	-1/+1
	When host is under high stress, it is very possible thread running netdev_wait_allrefs() returns from msleep(250) 10 seconds late. This leads to these messages in the syslog : [...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0 If the device refcount is zero, the wait is over. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	aqc111: cleanup mtu related logic	Igor Russkikh	1	-4/+2
	Original fix b8b277525e9d was done under impression that invalid data could be written for mtu configuration higher that 16334. But the high limit will anyway be rejected my max_mtu check in caller. Thus, make the code cleaner and allow it doing the configuration without checking for maximum mtu value. Fixes: b8b277525e9d ("aqc111: fix endianness issue in aqc111_change_mtu") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	Revert "aqc111: fix writing to the phy on BE"	Igor Russkikh	1	-17/+6
	This reverts commit 369b46e9fbcfa5136f2cb5f486c90e5f7fa92630. The required temporary storage is already done inside of write32/16 helpers. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	Revert "aqc111: fix double endianness swap on BE"	Igor Russkikh	1	-4/+2
	This reverts commit 2cf672709beb005f6e90cb4edbed6f2218ba953e. The required temporary storage is already done inside of write32/16 helpers. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	xfrm: ressurrect "Fix uninitialized memory read in _decode_session4"	Florian Westphal	1	-11/+13
	This resurrects commit 8742dc86d0c7a9628 ("xfrm4: Fix uninitialized memory read in _decode_session4"), which got lost during a merge conflict resolution between ipsec-next and net-next tree. c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy") in ipsec-next moved the (buggy) _decode_session4 from net/ipv4/xfrm4_policy.c to net/xfrm/xfrm_policy.c. In mean time, 8742dc86d0c7a was applied to ipsec.git and fixed the problem in the "old" location. When the trees got merged, the moved, old function was kept. This applies the "lost" commit again, to the new location. Fixes: a658a3f2ecbab ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	libbpf: move logging helpers into libbpf_internal.h	Andrii Nakryiko	5	-16/+15
	libbpf_util.h header was recently exposed as public as a dependency of xsk.h. In addition to memory barriers, it contained logging helpers, which are not supposed to be exposed. This patch moves those into libbpf_internal.h, which is kept as an internal header. Cc: Stanislav Fomichev <sdf@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Fixes: 7080da890984 ("libbpf: add libbpf_util.h to header install.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-16	tipc: switch order of device registration to fix a crash	Junwei Hu	1	-7/+7
	When tipc is loaded while many processes try to create a TIPC socket, a crash occurs: PANIC: Unable to handle kernel paging request at virtual address "dfff20000000021d" pc : tipc_sk_create+0x374/0x1180 [tipc] lr : tipc_sk_create+0x374/0x1180 [tipc] Exception class = DABT (current EL), IL = 32 bits Call trace: tipc_sk_create+0x374/0x1180 [tipc] __sock_create+0x1cc/0x408 __sys_socket+0xec/0x1f0 __arm64_sys_socket+0x74/0xa8 ... This is due to race between sock_create and unfinished register_pernet_device. tipc_sk_insert tries to do "net_generic(net, tipc_net_id)". but tipc_net_id is not initialized yet. So switch the order of the two to close the race. This can be reproduced with multiple processes doing socket(AF_TIPC, ...) and one process doing module removal. Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reviewed-by: Xiaogang Wang <wangxiaogang3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	ipv6: prevent possible fib6 leaks	Eric Dumazet	3	-4/+18
	At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible for finding all percpu routes and set their ->from pointer to NULL, so that fib6_ref can reach its expected value (1). The problem right now is that other cpus can still catch the route being deleted, since there is no rcu grace period between the route deletion and call to fib6_drop_pcpu_from() This can leak the fib6 and associated resources, since no notifier will take care of removing the last reference(s). I decided to add another boolean (fib6_destroying) instead of reusing/renaming exception_bucket_flushed to ease stable backports, and properly document the memory barriers used to implement this fix. This patch has been co-developped with Wei Wang. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Wei Wang <weiwan@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net: test nouarg before dereferencing zerocopy pointers	Willem de Bruijn	1	-3/+6
	Zerocopy skbs without completion notification were added for packet sockets with PACKET_TX_RING user buffers. Those signal completion through the TP_STATUS_USER bit in the ring. Zerocopy annotation was added only to avoid premature notification after clone or orphan, by triggering a copy on these paths for these packets. The mechanism had to define a special "no-uarg" mode because packet sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg for a different pointer. Before deferencing skb_uarg(skb), verify that it is a real pointer. Fixes: 5cd8d46ea1562 ("packet: copy user buffers before orphan or clone") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net: usb: qmi_wwan: add Telit 0x1260 and 0x1261 compositions	Daniele Palmas	1	-0/+2
	Added support for Telit LE910Cx 0x1260 and 0x1261 compositions. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net: phy: aquantia: readd XGMII support for AQR107	Madalin-cristian Bucur	1	-0/+1
	XGMII interface mode no longer works on AQR107 after the recent changes, adding back support. Fixes: 570c8a7d5303 ("net: phy: aquantia: check for supported interface modes in config_init") Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net: bpfilter: fallback to netfilter if failed to load bpfilter kernel module	Konstantin Khlebnikov	1	-4/+2
	If bpfilter is not available return ENOPROTOOPT to fallback to netfilter. Function request_module() returns both errors and userspace exit codes. Just ignore them. Rechecking bpfilter_ops is enough. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	hv_sock: Add support for delayed close	Sunil Muthuswamy	1	-31/+77
	Currently, hvsock does not implement any delayed or background close logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and the last reference to the socket is dropped, which leads to a call to .destruct where the socket can hang indefinitely waiting for the peer to close it's side. The can cause the user application to hang in the close() call. This change implements proper STREAM(TCP) closing handshake mechanism by sending the FIN to the peer and the waiting for the peer's FIN to arrive for a given timeout. On timeout, it will try to terminate the connection (i.e. a RST). This is in-line with other socket providers such as virtio. This change does not address the hang in the vmbus_hvsock_device_unregister where it waits indefinitely for the host to rescind the channel. That should be taken up as a separate fix. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	atm: iphase: Avoid copying pointers to user space.	Fuqian Huang	1	-6/+0
	Remove the MEMDUMP_DEV case in ia_ioctl to avoid copy pointers to user space. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	net/mlx5e: Fix calling wrong function to get inner vlan key and mask	Jianbo Liu	1	-1/+1
	When flow_rule_match_XYZ() functions were first introduced, flow_rule_match_cvlan() for inner vlan is missing. In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan() is just called, which is wrong because it obtains outer vlan information by FLOW_DISSECTOR_KEY_VLAN. This commit fixes this by changing to call flow_rule_match_cvlan() after it's added. Fixes: 8f2566225ae2 ("flow_offload: add flow_rule and flow_match structures and use them") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	flow_offload: support CVLAN match	Edward Cree	2	-0/+9
	Plumb it through from the flow_dissector. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	tools/bpftool: move set_max_rlimit() before __bpf_object__open_xattr()	Yonghong Song	1	-2/+2
	For a host which has a lower rlimit for max locked memory (e.g., 64KB), the following error occurs in one of our production systems: # /usr/sbin/bpftool prog load /paragon/pods/52877437/home/mark.o \ /sys/fs/bpf/paragon_mark_21 type cgroup/skb \ map idx 0 pinned /sys/fs/bpf/paragon_map_21 libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. Error: failed to open object file The reason is due to low locked memory during bpf_object__probe_name() which probes whether program name is supported in kernel or not during __bpf_object__open_xattr(). bpftool program load already tries to relax mlock rlimit before bpf_object__load(). Let us move set_max_rlimit() before __bpf_object__open_xattr(), which fixed the issue here. Fixes: 47eff61777c7 ("bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-16	selftests/bpf: add test_sysctl and map_tests/tests.h to .gitignore	Stanislav Fomichev	2	-0/+2
	Missing files are: * tools/testing/selftests/bpf/map_tests/tests.h - autogenerated * tools/testing/selftests/bpf/test_sysctl - binary Fixes: 51a0e301a563 ("bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps") Fixes: 1f5fa9ab6e2e ("selftests/bpf: Test BPF_CGROUP_SYSCTL") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-16	bpf: relax inode permission check for retrieving bpf program	Chenbo Feng	1	-1/+1
	For iptable module to load a bpf program from a pinned location, it only retrieve a loaded program and cannot change the program content so requiring a write permission for it might not be necessary. Also when adding or removing an unrelated iptable rule, it might need to flush and reload the xt_bpf related rules as well and triggers the inode permission check. It might be better to remove the write premission check for the inode so we won't need to grant write access to all the processes that flush and restore iptables rules. Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-16	rhashtable: Fix cmpxchg RCU warnings	Herbert Xu	1	-2/+3
	As cmpxchg is a non-RCU mechanism it will cause sparse warnings when we use it for RCU. This patch adds explicit casts to silence those warnings. This should probably be moved to RCU itself in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	rhashtable: Remove RCU marking from rhash_lock_head	Herbert Xu	2	-40/+46
	The opaque type rhash_lock_head should not be marked with __rcu because it can never be dereferenced. We should apply the RCU marking when we turn it into a pointer which can be dereferenced. This patch does exactly that. This fixes a number of sparse warnings as well as getting rid of some unnecessary RCU checking. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16	bpf, tcp: correctly handle DONT_WAIT flags and timeo == 0	John Fastabend	1	-1/+4
	The tcp_bpf_wait_data() routine needs to check timeo != 0 before calling sk_wait_event() otherwise we may see unexpected stalls on receiver. Arika did all the leg work here I just formatted, posted and ran a few tests. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: Arika Chen <eaglesora@gmail.com> Suggested-by: Arika Chen <eaglesora@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16	selftests/bpf: add prog detach to flow_dissector test	Stanislav Fomichev	1	-0/+1
	In case we are not running in a namespace (which we don't do by default), let's try to detach the bpf program that we use for eth_get_headlen tests. Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16	selftests/bpf: add missing \n to flow_dissector CHECK errors	Stanislav Fomichev	1	-4/+4
	Otherwise, in case of an error, everything gets mushed together. Fixes: a5cb33464e53 ("selftests/bpf: make flow dissector tests more extensible") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16	libbpf: don't fail when feature probing fails	Stanislav Fomichev	1	-1/+1
	Otherwise libbpf is unusable from unprivileged process with kernel.kernel.unprivileged_bpf_disabled=1. All I get is EPERM from the probes, even if I just want to open an ELF object and look at what progs/maps it has. Instead of dying on probes, let's just pr_debug the error and try to continue. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-15	tcp: do not recycle cloned skbs	Eric Dumazet	2	-2/+2
	It is illegal to change arbitrary fields in skb_shared_info if the skb is cloned. Before calling skb_zcopy_clear() we need to ensure this rule, therefore we need to move the test from sk_stream_alloc_skb() to sk_wmem_free_skb() Fixes: 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>