wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2021-04-14	net: macb: fix the restore of cmp registers	Claudiu Beznea	1	-1/+1
	Commit a14d273ba159 ("net: macb: restore cmp registers on resume path") introduces the restore of CMP registers on resume path. In case the IP doesn't support type 2 screeners (zero on DCFG8 register) the struct macb::rx_fs_list::list is not initialized and thus the list_for_each_entry(item, &bp->rx_fs_list.list, list) loop introduced in commit a14d273ba159 ("net: macb: restore cmp registers on resume path") will access an uninitialized list leading to crash. Thus, initialize the struct macb::rx_fs_list::list without taking into account if the IP supports type 2 screeners or not. Fixes: a14d273ba159 ("net: macb: restore cmp registers on resume path") Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	vrf: fix a comment about loopback device	Nicolas Dichtel	1	-6/+4
	This is a leftover of the below commit. Fixes: 4f04256c983a ("net: vrf: Drop local rtable and rt6_info") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	doc: move seg6_flowlabel to seg6-sysctl.rst	Nicolas Dichtel	2	-15/+13
	Let's have all seg6 sysctl at the same place. Fixes: a6dc6670cd7e ("ipv6: sr: Add documentation for seg_flowlabel sysctl") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	ibmvnic: remove duplicate napi_schedule call in open function	Lijun Pan	1	-5/+0
	Remove the unnecessary napi_schedule() call in __ibmvnic_open() since interrupt_rx() calls napi_schedule_prep/__napi_schedule during every receive interrupt. Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	ibmvnic: remove duplicate napi_schedule call in do_reset function	Lijun Pan	1	-5/+1
	During adapter reset, do_reset/do_hard_reset calls ibmvnic_open(), which will calls napi_schedule if previous state is VNIC_CLOSED (i.e, the reset case, and "ifconfig down" case). So there is no need for do_reset to call napi_schedule again at the end of the function though napi_schedule will neglect the request if napi is already scheduled. Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	ibmvnic: avoid calling napi_disable() twice	Lijun Pan	1	-2/+1
	__ibmvnic_open calls napi_disable without checking whether NAPI polling has already been disabled or not. This could cause napi_disable being called twice, which could generate deadlock. For example, the first napi_disable will spin until NAPI_STATE_SCHED is cleared by napi_complete_done, then set it again. When napi_disable is called the second time, it will loop infinitely because no dev->poll will be running to clear NAPI_STATE_SCHED. To prevent above scenario from happening, call ibmvnic_napi_disable() which checks if napi is disabled or not before calling napi_disable. Fixes: bfc32f297337 ("ibmvnic: Move resource initialization to its own routine") Suggested-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	r8169: don't advertise pause in jumbo mode	Heiner Kallweit	1	-2/+7
	It has been reported [0] that using pause frames in jumbo mode impacts performance. There's no available chip documentation, but vendor drivers r8168 and r8125 don't advertise pause in jumbo mode. So let's do the same, according to Roman it fixes the issue. [0] https://bugzilla.kernel.org/show_bug.cgi?id=212617 Fixes: 9cf9b84cc701 ("r8169: make use of phy_set_asym_pause") Reported-by: Roman Mamedov <rm+bko@romanrm.net> Tested-by: Roman Mamedov <rm+bko@romanrm.net> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14	ethtool: pause: make sure we init driver stats	Jakub Kicinski	1	-4/+4
	The intention was for pause statistics to not be reported when driver does not have the relevant callback (only report an empty netlink nest). What happens currently we report all 0s instead. Make sure statistics are initialized to "not set" (which is -1) so the dumping code skips them. Fixes: 9a27a33027f2 ("ethtool: add standard pause stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	xen-netback: Check for hotplug-status existence before watching	Michael Brown	1	-4/+8
	The logic in connect() is currently written with the assumption that xenbus_watch_pathfmt() will return an error for a node that does not exist. This assumption is incorrect: xenstore does allow a watch to be registered for a nonexistent node (and will send notifications should the node be subsequently created). As of commit 1f2565780 ("xen-netback: remove 'hotplug-status' once it has served its purpose"), this leads to a failure when a domU transitions into XenbusStateConnected more than once. On the first domU transition into Connected state, the "hotplug-status" node will be deleted by the hotplug_status_changed() callback in dom0. On the second or subsequent domU transition into Connected state, the hotplug_status_changed() callback will therefore never be invoked, and so the backend will remain stuck in InitWait. This failure prevents scenarios such as reloading the xen-netfront module within a domU, or booting a domU via iPXE. There is unfortunately no way for the domU to work around this dom0 bug. Fix by explicitly checking for existence of the "hotplug-status" node, thereby creating the behaviour that was previously assumed to exist. Signed-off-by: Michael Brown <mbrown@fensystems.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	gro: ensure frag0 meets IP header alignment	Eric Dumazet	1	-1/+2
	After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head") Guenter Roeck reported one failure in his tests using sh architecture. After much debugging, we have been able to spot silent unaligned accesses in inet_gro_receive() The issue at hand is that upper networking stacks assume their header is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN bytes before the Ethernet header to make that happen. This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path if the fragment is not properly aligned. Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN as 0, this extra check will be a NOP for them. Note that if frag0 is not used, GRO will call pskb_may_pull() as many times as needed to pull network and transport headers. Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head") Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	net/sctp: fix race condition in sctp_destroy_sock	Or Cohen	1	-8/+5
	If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. The bug is fixed by acquiring addr_wq_lock in sctp_destroy_sock instead of sctp_close. This addresses CVE-2021-23133. Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	ibmvnic: correctly use dev_consume/free_skb_irq	Lijun Pan	1	-4/+7
	It is more correct to use dev_kfree_skb_irq when packets are dropped, and to use dev_consume_skb_irq when packets are consumed. Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Suggested-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	net: Make tcp_allowed_congestion_control readonly in non-init netns	Jonathon Reinhart	1	-3/+13
	Currently, tcp_allowed_congestion_control is global and writable; writing to it in any net namespace will leak into all other net namespaces. tcp_available_congestion_control and tcp_allowed_congestion_control are the only sysctls in ipv4_net_table (the per-netns sysctl table) with a NULL data pointer; their handlers (proc_tcp_available_congestion_control and proc_allowed_congestion_control) have no other way of referencing a struct net. Thus, they operate globally. Because ipv4_net_table does not use designated initializers, there is no easy way to fix up this one "bad" table entry. However, the data pointer updating logic shouldn't be applied to NULL pointers anyway, so we instead force these entries to be read-only. These sysctls used to exist in ipv4_table (init-net only), but they were moved to the per-net ipv4_net_table, presumably without realizing that tcp_allowed_congestion_control was writable and thus introduced a leak. Because the intent of that commit was only to know (i.e. read) "which congestion algorithms are available or allowed", this read-only solution should be sufficient. The logic added in recent commit 31c4d2f160eb: ("net: Ensure net namespace isolation of sysctls") does not and cannot check for NULL data pointers, because other table entries (e.g. /proc/sys/net/netfilter/nf_log/) have .data=NULL but use other methods (.extra2) to access the struct net. Fixes: 9cb8e048e5d9 ("net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns") Signed-off-by: Jonathon Reinhart <jonathon.reinhart@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	net: ip6_tunnel: Unregister catch-all devices	Hristo Venev	1	-0/+10
	Similarly to the sit case, we need to remove the tunnels with no addresses that have been moved to another network namespace. Fixes: 0bd8762824e73 ("ip6tnl: add x-netns support") Signed-off-by: Hristo Venev <hristo@venev.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	net: sit: Unregister catch-all devices	Hristo Venev	1	-2/+2
	A sit interface created without a local or a remote address is linked into the `sit_net::tunnels_wc` list of its original namespace. When deleting a network namespace, delete the devices that have been moved. The following script triggers a null pointer dereference if devices linked in a deleted `sit_net` remain: for i in `seq 1 30`; do ip netns add ns-test ip netns exec ns-test ip link add dev veth0 type veth peer veth1 ip netns exec ns-test ip link add dev sit$i type sit dev veth0 ip netns exec ns-test ip link set dev sit$i netns $$ ip netns del ns-test done for i in `seq 1 30`; do ip link del dev sit$i done Fixes: 5e6700b3bf98f ("sit: add support of x-netns") Signed-off-by: Hristo Venev <hristo@venev.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13	netfilter: nftables: clone set element expression template	Pablo Neira Ayuso	1	-12/+34
	memcpy() breaks when using connlimit in set elements. Use nft_expr_clone() to initialize the connlimit expression list, otherwise connlimit garbage collector crashes when walking on the list head copy. [ 493.064656] Workqueue: events_power_efficient nft_rhash_gc [nf_tables] [ 493.064685] RIP: 0010:find_or_evict+0x5a/0x90 [nf_conncount] [ 493.064694] Code: 2b 43 40 83 f8 01 77 0d 48 c7 c0 f5 ff ff ff 44 39 63 3c 75 df 83 6d 18 01 48 8b 43 08 48 89 de 48 8b 13 48 8b 3d ee 2f 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83 [ 493.064699] RSP: 0018:ffffc90000417dc0 EFLAGS: 00010297 [ 493.064704] RAX: 0000000000000000 RBX: ffff888134f38410 RCX: 0000000000000000 [ 493.064708] RDX: 0000000000000000 RSI: ffff888134f38410 RDI: ffff888100060cc0 [ 493.064711] RBP: ffff88812ce594a8 R08: ffff888134f38438 R09: 00000000ebb9025c [ 493.064714] R10: ffffffff8219f838 R11: 0000000000000017 R12: 0000000000000001 [ 493.064718] R13: ffffffff82146740 R14: ffff888134f38410 R15: 0000000000000000 [ 493.064721] FS: 0000000000000000(0000) GS:ffff88840e440000(0000) knlGS:0000000000000000 [ 493.064725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 493.064729] CR2: 0000000000000008 CR3: 00000001330aa002 CR4: 00000000001706e0 [ 493.064733] Call Trace: [ 493.064737] nf_conncount_gc_list+0x8f/0x150 [nf_conncount] [ 493.064746] nft_rhash_gc+0x106/0x390 [nf_tables] Reported-by: Laura Garcia Liebana <nevola@gmail.com> Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13	netfilter: x_tables: fix compat match/target pad out-of-bound write	Florian Westphal	4	-8/+8
	xt_compat_match/target_from_user doesn't check that zeroing the area to start of next rule won't write past end of allocated ruleset blob. Remove this code and zero the entire blob beforehand. Reported-by: syzbot+cfc0247ac173f597aaaa@syzkaller.appspotmail.com Reported-by: Andy Nguyen <theflow@google.com> Fixes: 9fa492cdc160c ("[NETFILTER]: x_tables: simplify compat API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-12	ethtool: fix kdoc attr name	Jakub Kicinski	1	-3/+3
	Add missing 't' in attrtype. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12	net: phy: marvell: fix detection of PHY on Topaz switches	Pali Rohár	3	-22/+45
	Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading"), Linux reports the temperature of Topaz hwmon as constant -75°C. This is because switches from the Topaz family (88E6141 / 88E6341) have the address of the temperature sensor register different from Peridot. This address is instead compatible with 88E1510 PHYs, as was used for Topaz before the above mentioned commit. Create a new mapping table between switch family and PHY ID for families which don't have a model number. And define PHY IDs for Topaz and Peridot families. Create a new PHY ID and a new PHY driver for Topaz's internal PHY. The only difference from Peridot's PHY driver is the HWMON probing method. Prior this change Topaz's internal PHY is detected by kernel as: PHY [...] driver [Marvell 88E6390] (irq=63) And afterwards as: PHY [...] driver [Marvell 88E6341 Family] (irq=63) Signed-off-by: Pali Rohár <pali@kernel.org> BugLink: https://github.com/globalscaletechnologies/linux/issues/1 Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading") Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11	net: geneve: check skb is large enough for IPv4/IPv6 header	Phillip Potter	1	-0/+6
	Check within geneve_xmit_skb/geneve6_xmit_skb that sk_buff structure is large enough to include IPv4 or IPv6 header, and reject if not. The geneve_xmit_skb portion and overall idea was contributed by Eric Dumazet. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=abe95dc3e3e9667fc23b8d81f29ecad95c6f106f Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+2e406a9ac75bb71d4b7a@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11	net: davicom: Fix regulator not turned off on failed probe	Christophe JAILLET	1	-2/+4
	When the probe fails, we must disable the regulator that was previously enabled. This patch is a follow-up to commit ac88c531a5b3 ("net: davicom: Fix regulator not turned off on failed probe") which missed one case. Fixes: 7994fe55a4a2 ("dm9000: Add regulator and reset support to dm9000") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11	MAINTAINERS: update maintainer entry for freescale fec driver	Joakim Zhang	1	-1/+1
	Update maintainer entry for freescale fec driver. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-10	netfilter: arp_tables: add pre_exit hook for table unregister	Florian Westphal	3	-5/+19
	Same problem that also existed in iptables/ip(6)tables, when arptable_filter is removed there is no longer a wait period before the table/ruleset is free'd. Unregister the hook in pre_exit, then remove the table in the exit function. This used to work correctly because the old nf_hook_unregister API did unconditional synchronize_net. The per-net hook unregister function uses call_rcu instead. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-10	netfilter: bridge: add pre_exit hooks for ebtable unregistration	Florian Westphal	5	-8/+51
	Just like ip/ip6/arptables, the hooks have to be removed, then synchronize_rcu() has to be called to make sure no more packets are being processed before the ruleset data is released. Place the hook unregistration in the pre_exit hook, then call the new ebtables pre_exit function from there. Years ago, when first netns support got added for netfilter+ebtables, this used an older (now removed) netfilter hook unregister API, that did a unconditional synchronize_rcu(). Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit handlers may free the ebtable ruleset while packets are still in flight. This can only happens on module removal, not during netns exit. The new function expects the table name, not the table struct. This is because upcoming patch set (targeting -next) will remove all net->xt.{nat,filter,broute}_table instances, this makes it necessary to avoid external references to those member variables. The existing APIs will be converted, so follow the upcoming scheme of passing name + hook type instead. Fixes: aee12a0a3727e ("ebtables: remove nf_hook_register usage") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-10	netfilter: nft_limit: avoid possible divide error in nft_limit_init	Eric Dumazet	1	-2/+2
	div_u64() divides u64 by u32. nft_limit_init() wants to divide u64 by u64, use the appropriate math function (div64_u64) divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:div_u64_rem include/linux/math64.h:28 [inline] RIP: 0010:div_u64 include/linux/math64.h:127 [inline] RIP: 0010:nft_limit_init+0x2a2/0x5e0 net/netfilter/nft_limit.c:85 Code: ef 4c 01 eb 41 0f 92 c7 48 89 de e8 38 a5 22 fa 4d 85 ff 0f 85 97 02 00 00 e8 ea 9e 22 fa 4c 0f af f3 45 89 ed 31 d2 4c 89 f0 <49> f7 f5 49 89 c6 e8 d3 9e 22 fa 48 8d 7d 48 48 b8 00 00 00 00 00 RSP: 0018:ffffc90009447198 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000200000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff875152e6 RDI: 0000000000000003 RBP: ffff888020f80908 R08: 0000200000000000 R09: 0000000000000000 R10: ffffffff875152d8 R11: 0000000000000000 R12: ffffc90009447270 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 000000000097a300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c4 CR3: 0000000026a52000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: nf_tables_newexpr net/netfilter/nf_tables_api.c:2675 [inline] nft_expr_init+0x145/0x2d0 net/netfilter/nf_tables_api.c:2713 nft_set_elem_expr_alloc+0x27/0x280 net/netfilter/nf_tables_api.c:5160 nf_tables_newset+0x1997/0x3150 net/netfilter/nf_tables_api.c:4321 nfnetlink_rcv_batch+0x85a/0x21b0 net/netfilter/nfnetlink.c:456 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:580 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:598 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c26844eda9d4 ("netfilter: nf_tables: Fix nft limit burst handling") Fixes: 3e0f64b7dd31 ("netfilter: nft_limit: fix packet ratelimiting") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Luigi Rizzo <lrizzo@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-09	net: fix hangup on napi_disable for threaded napi	Paolo Abeni	1	-1/+2
	napi_disable() is subject to an hangup, when the threaded mode is enabled and the napi is under heavy traffic. If the relevant napi has been scheduled and the napi_disable() kicks in before the next napi_threaded_wait() completes - so that the latter quits due to the napi_disable_pending() condition, the existing code leaves the NAPI_STATE_SCHED bit set and the napi_disable() loop waiting for such bit will hang. This patch addresses the issue by dropping the NAPI_STATE_DISABLE bit test in napi_thread_wait(). The later napi_threaded_poll() iteration will take care of clearing the NAPI_STATE_SCHED. This also addresses a related problem reported by Jakub: before this patch a napi_disable()/napi_enable() pair killed the napi thread, effectively disabling the threaded mode. On the patched kernel napi_disable() simply stops scheduling the relevant thread. v1 -> v2: - let the main napi_thread_poll() loop clear the SCHED bit Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09	net: hns3: Trivial spell fix in hns3 driver	Salil Mehta	2	-3/+3
	Some trivial spelling mistakes which caught my eye during the review of the code. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Link: https://lore.kernel.org/r/20210409074223.32480-1-salil.mehta@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09	lan743x: fix ethernet frame cutoff issue	Sven Van Asbroeck	1	-4/+4
	The ethernet frame length is calculated incorrectly. Depending on the value of RX_HEAD_PADDING, this may result in ethernet frames that are too short (cut off at the end), or too long (garbage added to the end). Fix by calculating the ethernet frame length correctly. For added clarity, use the ETH_FCS_LEN constant in the calculation. Many thanks to Heiner Kallweit for suggesting this solution. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Fixes: 3e21a10fdea3 ("lan743x: trim all 4 bytes of the FCS; not just 2") Link: https://lore.kernel.org/lkml/20210408172353.21143-1-TheSven73@gmail.com/ Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: George McCollister <george.mccollister@gmail.com> Tested-by: George McCollister <george.mccollister@gmail.com> Link: https://lore.kernel.org/r/20210409003904.8957-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09	of: property: fw_devlink: do not link ".*,nr-gpios"	Ilya Lipnitskiy	1	-1/+10
	[<vendor>,]nr-gpios property is used by some GPIO drivers[0] to indicate the number of GPIOs present on a system, not define a GPIO. nr-gpios is not configured by #gpio-cells and can't be parsed along with other "*-gpios" properties. nr-gpios without the "<vendor>," prefix is not allowed by the DT spec[1], so only add exception for the ",nr-gpios" suffix and let the error message continue being printed for non-compliant implementations. [0] nr-gpios is referenced in Documentation/devicetree/bindings/gpio: - gpio-adnp.txt - gpio-xgene-sb.txt - gpio-xlp.txt - snps,dw-apb-gpio.yaml [1] Link: https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20 Fixes errors such as: OF: /palmbus@300000/gpio@600: could not find phandle Fixes: 7f00be96f125 ("of: property: Add device link support for interrupt-parent, dmas and -gpio(s)") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Saravana Kannan <saravanak@google.com> Cc: stable@vger.kernel.org # v5.5+ Link: https://lore.kernel.org/r/20210405222540.18145-1-ilya.lipnitskiy@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-04-09	dt-bindings:iio:adc: update motorola,cpcap-adc.yaml reference	Mauro Carvalho Chehab	1	-1/+1
	Changeset 1ca9d1b1342d ("dt-bindings:iio:adc:motorola,cpcap-adc yaml conversion") renamed: Documentation/devicetree/bindings/iio/adc/cpcap-adc.txt to: Documentation/devicetree/bindings/iio/adc/motorola,cpcap-adc.yaml. Update its cross-reference accordingly. Fixes: 1ca9d1b1342d ("dt-bindings:iio:adc:motorola,cpcap-adc yaml conversion") Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/3e205e5fa701e4bc15d39d6ac1f57717df2bb4c6.1617972339.git.mchehab+huawei@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2021-04-09	dt-bindings: fix references for iio-bindings.txt	Mauro Carvalho Chehab	5	-6/+14
	The iio-bindings.txt was converted into two files and merged at the dt-schema git tree at: https://github.com/devicetree-org/dt-schema Yet, some documents still refer to the old file. Fix their references, in order to point to the right URL. Fixes: dba91f82d580 ("dt-bindings:iio:iio-binding.txt Drop file as content now in dt-schema") Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/4efd81eca266ca0875d3bf9d1672097444146c69.1617972339.git.mchehab+huawei@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2021-04-09	dt-bindings: don't use ../dir for doc references	Mauro Carvalho Chehab	2	-9/+9
	As documents have been renamed and moved around, their references will break, but this will be unnoticed, as the script which checks for it won't handle "../" references. So, replace them by the full patch. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/68d3a1244119d1f2829c375b0ef554cf348bc89f.1617972339.git.mchehab+huawei@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2021-04-09	vdpa/mlx5: Fix suspend/resume index restoration	Eli Cohen	1	-13/+8
	When we suspend the VM, the VDPA interface will be reset. When the VM is resumed again, clear_virtqueues() will clear the available and used indices resulting in hardware virqtqueue objects becoming out of sync. We can avoid this function alltogether since qemu will clear them if required, e.g. when the VM went through a reboot. Moreover, since the hw available and used indices should always be identical on query and should be restored to the same value same value for virtqueues that complete in order, we set the single value provided by set_vq_state(). In get_vq_state() we return the value of hardware used index. Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20210408091047.4269-6-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-04-09	vdpa/mlx5: Fix wrong use of bit numbers	Eli Cohen	1	-2/+2
	VIRTIO_F_VERSION_1 is a bit number. Use BIT_ULL() with mask conditionals. Also, in mlx5_vdpa_is_little_endian() use BIT_ULL for consistency with the rest of the code. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20210408091047.4269-5-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-04-09	vdpa/mlx5: Retrieve BAR address suitable any function	Eli Cohen	1	-1/+2
	struct mlx5_core_dev has a bar_addr field that contains the correct bar address for the function regardless of whether it is pci function or sub function. Use it. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210408091047.4269-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-04-09	vdpa/mlx5: Use the correct dma device when registering memory	Eli Cohen	1	-2/+7
	In cases where the vdpa instance uses a SF (sub function), the DMA device is the parent device. Use a function to retrieve the correct DMA device. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210408091047.4269-3-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-04-09	vdpa/mlx5: should exclude header length and fcs from mtu	Si-Wei Liu	2	-1/+18
	When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa, 22 extra bytes worth of MTU length is shown in guest. This is because the mlx5_query_port_max_mtu API returns the "hardware" MTU value, which does not just contain the Ethernet payload, but includes extra lengths starting from the Ethernet header up to the FCS altogether. Fix the MTU so packets won't get dropped silently. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20210408091047.4269-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-04-09	Bluetooth: btusb: Revert Fix the autosuspend enable and disable	Hans de Goede	1	-5/+2
	drivers/usb/core/hub.c: usb_new_device() contains the following: /* By default, forbid autosuspend for all devices. It will be * allowed for hubs during binding. */ usb_disable_autosuspend(udev); So for anything which is not a hub, such as btusb devices, autosuspend is disabled by default and we must call usb_enable_autosuspend(udev) to enable it. This means that the "Fix the autosuspend enable and disable" commit, which drops the usb_enable_autosuspend() call when the enable_autosuspend module option is true, is completely wrong, revert it. This reverts commit 7bd9fb058d77213130e4b3e594115c028b708e7e. Cc: Hui Wang <hui.wang@canonical.com> Fixes: 7bd9fb058d77 ("Bluetooth: btusb: Fix the autosuspend enable and disable") Acked-by: Hui Wang <hui.wang@canonical.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-08	net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh	Muhammad Usama Anjum	1	-3/+5
	nlh is being checked for validtity two times when it is dereferenced in this function. Check for validity again when updating the flags through nlh pointer to make the dereferencing safe. CC: <stable@vger.kernel.org> Addresses-Coverity: ("NULL pointer dereference") Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits	Martin Blumenstingl	1	-3/+16
	There are a few more bits in the GSWIP_MII_CFG register for which we did rely on the boot-loader (or the hardware defaults) to set them up properly. For some external RMII PHYs we need to select the GSWIP_MII_CFG_RMII_CLK bit and also we should un-set it for non-RMII PHYs. The GSWIP_MII_CFG_RMII_CLK bit is ignored for other PHY connection modes. The GSWIP IP also supports in-band auto-negotiation for RGMII PHYs when the GSWIP_MII_CFG_RGMII_IBS bit is set. Clear this bit always as there's no known hardware which uses this (so it is not tested yet). Clear the xMII isolation bit when set at initialization time if it was previously set by the bootloader. Not doing so could lead to no traffic (neither RX nor TX) on a port with this bit set. While here, also add the GSWIP_MII_CFG_RESET bit. We don't need to manage it because this bit is self-clearning when set. We still add it here to get a better overview of the GSWIP_MII_CFG register. Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Cc: stable@vger.kernel.org Suggested-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	net: dsa: lantiq_gswip: Don't use PHY auto polling	Martin Blumenstingl	1	-26/+159
	PHY auto polling on the GSWIP hardware can be used so link changes (speed, link up/down, etc.) can be detected automatically. Internally GSWIP reads the PHY's registers for this functionality. Based on this automatic detection GSWIP can also automatically re-configure it's port settings. Unfortunately this auto polling (and configuration) mechanism seems to cause various issues observed by different people on different devices: - FritzBox 7360v2: the two Gbit/s ports (connected to the two internal PHY11G instances) are working fine but the two Fast Ethernet ports (using an AR8030 RMII PHY) are completely dead (neither RX nor TX are received). It turns out that the AR8030 PHY sets the BMSR_ESTATEN bit as well as the ESTATUS_1000_TFULL and ESTATUS_1000_XFULL bits. This makes the PHY auto polling state machine (rightfully?) think that the established link speed (when the other side is Gbit/s capable) is 1Gbit/s. - None of the Ethernet ports on the Zyxel P-2812HNU-F1 (two are connected to the internal PHY11G GPHYs while the other three are external RGMII PHYs) are working. Neither RX nor TX traffic was observed. It is not clear which part of the PHY auto polling state- machine caused this. - FritzBox 7412 (only one LAN port which is connected to one of the internal GPHYs running in PHY22F / Fast Ethernet mode) was seeing random disconnects (link down events could be seen). Sometimes all traffic would stop after such disconnect. It is not clear which part of the PHY auto polling state-machine cauased this. - TP-Link TD-W9980 (two ports are connected to the internal GPHYs running in PHY11G / Gbit/s mode, the other two are external RGMII PHYs) was affected by similar issues as the FritzBox 7412 just without the "link down" events Switch to software based configuration instead of PHY auto polling (and letting the GSWIP hardware configure the ports automatically) for the following link parameters: - link up/down - link speed - full/half duplex - flow control (RX / TX pause) After a big round of manual testing by various people (who helped test this on OpenWrt) it turns out that this fixes all reported issues. Additionally it can be considered more future proof because any "quirk" which is implemented for a PHY on the driver side can now be used with the GSWIP hardware as well because Linux is in control of the link parameters. As a nice side-effect this also solves a problem where fixed-links were not supported previously because we were relying on the PHY auto polling mechanism, which cannot work for fixed-links as there's no PHY from where it can read the registers. Configuring the link settings on the GSWIP ports means that we now use the settings from device-tree also for ports with fixed-links. Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Fixes: 3e6fdeb28f4c33 ("net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock") Cc: stable@vger.kernel.org Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	of: unittest: overlay: ensure proper alignment of copied FDT	Frank Rowand	3	-10/+23
	The Devicetree standard specifies an 8 byte alignment of the FDT. Code in libfdt expects this alignment for an FDT image in memory. kmemdup() returns 4 byte alignment on openrisc. Replace kmemdup() with kmalloc(), align pointer, memcpy() to get proper alignment. The 4 byte alignment exposed a related bug which triggered a crash on openrisc with: commit 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") as reported in: https://lore.kernel.org/lkml/20210327224116.69309-1-linux@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Frank Rowand <frank.rowand@sony.com> Link: https://lore.kernel.org/r/20210408204508.2276230-1-frowand.list@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-04-08	net: sched: sch_teql: fix null-pointer dereference	Pavel Tikhomirov	1	-0/+3
	Reproduce: modprobe sch_teql tc qdisc add dev teql0 root teql0 This leads to (for instance in Centos 7 VM) OOPS: [ 532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0 [ 532.366906] Oops: 0000 [#1] SMP [ 532.366987] Modules linked in: sch_teql ... [ 532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1 [ 532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014 [ 532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000 [ 532.368224] RIP: 0010:[<ffffffffc06124a8>] [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.368320] RSP: 0018:ffff8b7c9fdbf8e0 EFLAGS: 00010286 [ 532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000 [ 532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00 [ 532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600 [ 532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8 [ 532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8 [ 532.368800] FS: 00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000 [ 532.368885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0 [ 532.369046] Call Trace: [ 532.369159] [<ffffffff84c8192e>] qdisc_create+0x36e/0x450 [ 532.369268] [<ffffffff846a9b49>] ? ns_capable+0x29/0x50 [ 532.369366] [<ffffffff849afde2>] ? nla_parse+0x32/0x120 [ 532.369442] [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610 [ 532.371508] [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260 [ 532.372668] [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90 [ 532.373790] [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890 [ 532.374914] [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0 [ 532.376055] [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30 [ 532.377204] [<ffffffff84c8d400>] netlink_unicast+0x170/0x210 [ 532.378333] [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420 [ 532.379465] [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0 [ 532.380710] [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs] [ 532.381868] [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs] [ 532.383037] [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100 [ 532.384144] [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400 [ 532.385268] [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0 [ 532.386387] [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500 [ 532.387472] [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90 [ 532.388560] [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20 [ 532.389636] [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a [ 532.390704] [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146 [ 532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00 [ 532.394036] RIP [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.395127] RSP <ffff8b7c9fdbf8e0> [ 532.396179] CR2: 00000000000000a8 Null pointer dereference happens on master->slaves dereference in teql_destroy() as master is null-pointer. When qdisc_create() calls teql_qdisc_init() it imediately fails after check "if (m->dev == dev)" because both devices are teql0, and it does not set qdisc_priv(sch)->m leaving it zero on error path, then qdisc_create() imediately calls teql_destroy() which does not expect zero master pointer and we get OOPS. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	ipv6: report errors for iftoken via netlink extack	Stephen Hemminger	4	-10/+31
	Setting iftoken can fail for several different reasons but there and there was no report to user as to the cause. Add netlink extended errors to the processing of the request. This requires adding additional argument through rtnl_af_ops set_link_af callback. Reported-by: Hongren Zheng <li@zenithal.me> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	net: sched: fix err handler in tcf_action_init()	Vlad Buslov	3	-19/+19
	With recent changes that separated action module load from action initialization tcf_action_init() function error handling code was modified to manually release the loaded modules if loading/initialization of any further action in same batch failed. For the case when all modules successfully loaded and some of the actions were initialized before one of them failed in init handler. In this case for all previous actions the module will be released twice by the error handler: First time by the loop that manually calls module_put() for all ops, and second time by the action destroy code that puts the module after destroying the action. Reproduction: $ sudo tc actions add action simple sdata \"2\" index 2 $ sudo tc actions add action simple sdata \"1\" index 1 \ action simple sdata \"2\" index 2 RTNETLINK answers: File exists We have an error talking to the kernel $ sudo tc actions ls action simple total acts 1 action order 0: Simple <"2"> index 2 ref 1 bind 0 $ sudo tc actions flush action simple $ sudo tc actions ls action simple $ sudo tc actions add action simple sdata \"2\" index 2 Error: Failed to load TC action module. We have an error talking to the kernel $ lsmod \| grep simple act_simple 20480 -1 Fix the issue by modifying module reference counting handling in action initialization code: - Get module reference in tcf_idr_create() and put it in tcf_idr_release() instead of taking over the reference held by the caller. - Modify users of tcf_action_init_1() to always release the module reference which they obtain before calling init function instead of assuming that created action takes over the reference. - Finally, modify tcf_action_init_1() to not release the module reference when overwriting existing action as this is no longer necessary since both upper and lower layers obtain and manage their own module references independently. Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	net: sched: fix action overwrite reference counting	Vlad Buslov	3	-13/+23
	Action init code increments reference counter when it changes an action. This is the desired behavior for cls API which needs to obtain action reference for every classifier that points to action. However, act API just needs to change the action and releases the reference before returning. This sequence breaks when the requested action doesn't exist, which causes act API init code to create new action with specified index, but action is still released before returning and is deleted (unless it was referenced concurrently by cls API). Reproduction: $ sudo tc actions ls action gact $ sudo tc actions change action gact drop index 1 $ sudo tc actions ls action gact Extend tcf_action_init() to accept 'init_res' array and initialize it with action->ops->init() result. In tcf_action_add() remove pointers to created actions from actions array before passing it to tcf_action_put_many(). Fixes: cae422f379f3 ("net: sched: use reference counting action init") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	Revert "net: sched: bump refcount for new action in ACT replace mode"	Vlad Buslov	1	-3/+0
	This reverts commit 6855e8213e06efcaf7c02a15e12b1ae64b9a7149. Following commit in series fixes the issue without introducing regression in error rollback of tcf_action_destroy(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08	io-wq: cancel unbounded works on io-wq destroy	Pavel Begunkov	1	-0/+4
	WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470 RIP: 0010:io_ring_exit_work+0xe6/0x470 Call Trace: process_one_work+0x206/0x400 worker_thread+0x4a/0x3d0 kthread+0x129/0x170 ret_from_fork+0x22/0x30 INFO: task lfs-openat:2359 blocked for more than 245 seconds. task:lfs-openat state:D stack: 0 pid: 2359 ppid: 1 flags:0x00000004 Call Trace: ... wait_for_completion+0x8b/0xf0 io_wq_destroy_manager+0x24/0x60 io_wq_put_and_exit+0x18/0x30 io_uring_clean_tctx+0x76/0xa0 __io_uring_files_cancel+0x1b9/0x2e0 do_exit+0xc0/0xb40 ... Even after io-wq destroy has been issued io-wq worker threads will continue executing all left work items as usual, and may hang waiting for I/O that won't ever complete (aka unbounded). [<0>] pipe_read+0x306/0x450 [<0>] io_iter_do_read+0x1e/0x40 [<0>] io_read+0xd5/0x330 [<0>] io_issue_sqe+0xd21/0x18a0 [<0>] io_wq_submit_work+0x6c/0x140 [<0>] io_worker_handle_work+0x17d/0x400 [<0>] io_wqe_worker+0x2c0/0x330 [<0>] ret_from_fork+0x22/0x30 Cancel all unbounded I/O instead of executing them. This changes the user visible behaviour, but that's inevitable as io-wq is not per task. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08	io_uring: fix rw req completion	Pavel Begunkov	1	-0/+13
	WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18 As reissuing is now passed back by REQ_F_REISSUE and kiocb_done() internally uses __io_complete_rw(), it may stop after setting the flag so leaving a dangling request. There are tricky edge cases, e.g. reading beyound file, boundary, so the easiest way is to hand code reissue in kiocb_done() as __io_complete_rw() was doing for us before. Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08	RDMA/addr: Be strict with gid size	Leon Romanovsky	1	-1/+3
	The nla_len() is less than or equal to 16. If it's less than 16 then end of the "gid" buffer is uninitialized. Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20210405074434.264221-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>