linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-07-24	flow_offload: Move rhashtable inclusion to the source file	Herbert Xu	3	-2/+1
	I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23	geneve: fix an uninitialized value in geneve_changelink()	Cong Wang	1	-1/+1
	geneve_nl2info() sets 'df' conditionally, so we have to initialize it by copying the value from existing geneve device in geneve_changelink(). Fixes: 56c09de347e4 ("geneve: allow changing DF behavior after creation") Reported-by: syzbot+7ebc2e088af5e4c0c9fa@syzkaller.appspotmail.com Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23	bonding: check return value of register_netdevice() in bond_newlink()	Cong Wang	1	-2/+1
	Very similar to commit 544f287b8495 ("bonding: check error value of register_netdevice() immediately"), we should immediately check the return value of register_netdevice() before doing anything else. Fixes: 005db31d5f5f ("bonding: set carrier off for devices created through netlink") Reported-and-tested-by: syzbot+bbc3a11c4da63c1b74d6@syzkaller.appspotmail.com Cc: Beniamino Galvani <bgalvani@redhat.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23	tcp: allow at most one TLP probe per flight	Yuchung Cheng	3	-12/+18
	Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23	AX.25: Prevent integer overflows in connect and sendmsg	Dan Carpenter	1	-1/+4
	We recently added some bounds checking in ax25_connect() and ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because they were no longer required. Unfortunately, I believe they are required to prevent integer overflows so I have added them back. Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()") Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	cxgb4: add missing release on skb in uld_send()	Navid Emamdoost	1	-0/+1
	In the implementation of uld_send(), the skb is consumed on all execution paths except one. Release skb when returning NET_XMIT_DROP. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	net: atlantic: fix PTP on AQC10X	Egor Pomozov	1	-1/+6
	This patch fixes PTP on AQC10X. PTP support on AQC10X requires FW involvement and FW configures the TPS data arb mode itself. So we must make sure driver doesn't touch TPS data arb mode on AQC10x if PTP is enabled. Otherwise, there are no timestamps even though packets are flowing. Fixes: 2deac71ac492a ("net: atlantic: QoS implementation: min_rate") Signed-off-by: Egor Pomozov <epomozov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	AX.25: Prevent out-of-bounds read in ax25_sendmsg()	Peilin Ye	1	-1/+2
	Checks on `addr_len` and `usax->sax25_ndigis` are insufficient. ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7 or 8. Fix it. It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)` Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	sctp: shrink stream outq when fails to do addstream reconf	Xin Long	1	-2/+4
	When adding a stream with stream reconf, the new stream firstly is in CLOSED state but new out chunks can still be enqueued. Then once gets the confirmation from the peer, the state will change to OPEN. However, if the peer denies, it needs to roll back the stream. But when doing that, it only sets the stream outcnt back, and the chunks already in the new stream don't get purged. It caused these chunks can still be dequeued in sctp_outq_dequeue_data(). As its stream is still in CLOSE, the chunk will be enqueued to the head again by sctp_outq_head_data(). This chunk will never be sent out, and the chunks after it can never be dequeued. The assoc will be 'hung' in a dead loop of sending this chunk. To fix it, this patch is to purge these chunks already in the new stream by calling sctp_stream_shrink_out() when failing to do the addstream reconf. Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	sctp: shrink stream outq only when new outcnt < old outcnt	Xin Long	1	-7/+14
	It's not necessary to go list_for_each for outq->out_chunk_list when new outcnt >= old outcnt, as no chunk with higher sid than new (outcnt - 1) exists in the outqueue. While at it, also move the list_for_each code in a new function sctp_stream_shrink_out(), which will be used in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	AX.25: Fix out-of-bounds read in ax25_connect()	Peilin Ye	1	-1/+3
	Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient. ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis` equals to 7 or 8. Fix it. This issue has been reported as a KMSAN uninit-value bug, because in such a case, ax25_connect() reaches into the uninitialized portion of the `struct sockaddr_storage` statically allocated in __sys_connect(). It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`. Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	enetc: Remove the mdio bus on PF probe bailout	Claudiu Manoil	1	-0/+1
	For ENETC ports that register an external MDIO bus, the bus doesn't get removed on the error bailout path of enetc_pf_probe(). This issue became much more visible after recent: commit 07095c025ac2 ("net: enetc: Use DT protocol information to set up the ports") Before this commit, one could make probing fail on the error path only by having register_netdev() fail, which is unlikely. But after this commit, because it moved the enetc_of_phy_get() call up in the probing sequence, now we can trigger an mdiobus_free() bug just by forcing enetc_alloc_msix() to return error, i.e. with the 'pci=nomsi' kernel bootarg (since ENETC relies on MSI support to work), as the calltrace below shows: kernel BUG at /home/eiz/work/enetc/net/drivers/net/phy/mdio_bus.c:648! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] Hardware name: LS1028A RDB Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO BTYPE=--) pc : mdiobus_free+0x50/0x58 lr : devm_mdiobus_free+0x14/0x20 [...] Call trace: mdiobus_free+0x50/0x58 devm_mdiobus_free+0x14/0x20 release_nodes+0x138/0x228 devres_release_all+0x38/0x60 really_probe+0x1c8/0x368 driver_probe_device+0x5c/0xc0 device_driver_attach+0x74/0x80 __driver_attach+0x8c/0xd8 bus_for_each_dev+0x7c/0xd8 driver_attach+0x24/0x30 bus_add_driver+0x154/0x200 driver_register+0x64/0x120 __pci_register_driver+0x44/0x50 enetc_pf_driver_init+0x24/0x30 do_one_initcall+0x60/0x1c0 kernel_init_freeable+0x1fc/0x274 kernel_init+0x14/0x110 ret_from_fork+0x10/0x34 Fixes: ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload	Murali Karicheri	1	-1/+2
	Currently drive supports taprio offload which is a tc feature offloaded to cpsw hardware. So driver has to set the hw feature flag, NETIF_F_HW_TC in the net device to be compliant. This patch adds the flag. Fixes: 8127224c2708 ("ethernet: ti: am65-cpsw-qos: add TAPRIO offload support") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: ethernet: ave: Fix error returns in ave_init	Wang Hai	1	-1/+1
	When regmap_update_bits failed in ave_init(), calls of the functions reset_control_assert() and clk_disable_unprepare() were missed. Add goto out_reset_assert to do this. Fixes: 57878f2f4697 ("net: ethernet: ave: add support for phy-mode setting of system controller") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	drivers/net/wan/x25_asy: Fix to make it work	Xie He	1	-7/+14
	This driver is not working because of problems of its receiving code. This patch fixes it to make it work. When the driver receives an LAPB frame, it should first pass the frame to the LAPB module to process. After processing, the LAPB module passes the data (the packet) back to the driver, the driver should then add a one-byte pseudo header and pass the data to upper layers. The changes to the "x25_asy_bump" function and the "x25_asy_data_indication" function are to correctly implement this procedure. Also, the "x25_asy_unesc" function ignores any frame that is shorter than 3 bytes. However the shortest frames are 2-byte long. So we need to change it to allow 2-byte frames to pass. Cc: Eric Dumazet <edumazet@google.com> Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Reviewed-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22	ipvs: fix the connection sync failed in some cases	guodeqing	1	-4/+8
	The sync_thread_backup only checks sk_receive_queue is empty or not, there is a situation which cannot sync the connection entries when sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf, the sync packets are dropped in __udp_enqueue_schedule_skb, this is because the packets in reader_queue is not read, so the rmem is not reclaimed. Here I add the check of whether the reader_queue of the udp sock is empty or not to solve this problem. Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Reported-by: zhouxudong <zhouxudong8@huawei.com> Signed-off-by: guodeqing <geffrey.guo@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-21	selftest: txtimestamp: fix net ns entry logic	Paolo Pisati	1	-1/+1
	According to 'man 8 ip-netns', if `ip netns identify` returns an empty string, there's no net namespace associated with current PID: fix the net ns entrance logic. Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	qed: suppress false-positives interrupt error messages on HW init	Alexander Lobakin	3	-24/+32
	It was found that qed_pglueb_rbc_attn_handler() can produce a lot of false-positive error detections on driver load/reload (especially after crashes/recoveries) and spam the kernel log: [ 4.958275] [qed_pglueb_rbc_attn_handler:324()]ICPL error - 00d00ff0 [ 2079.146764] [qed_pglueb_rbc_attn_handler:324()]ICPL error - 00d80ff0 [ 2116.374631] [qed_pglueb_rbc_attn_handler:324()]ICPL error - 00d80ff0 [ 2135.250564] [qed_pglueb_rbc_attn_handler:324()]ICPL error - 00d80ff0 [...] Reduce the logging level of two false-positive prone error messages from notice to verbose on initialization (only) to not mix it with real error attentions while debugging. Fixes: 666db4862f2d ("qed: Revise load sequence to avoid PCI errors") Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	qed: suppress "don't support RoCE & iWARP" flooding on HW init	Alexander Lobakin	1	-2/+2
	Change the verbosity of the "don't support RoCE & iWARP simultaneously" warning to debug level to stop flooding on driver/hardware initialization: [ 4.783230] qede 01:00.00: Storm FW 8.37.7.0, Management FW 8.52.9.0 [MBI 15.10.6] [eth0] [ 4.810020] [qed_rdma_set_pf_params:2076()]Current day drivers don't support RoCE & iWARP simultaneously on the same PF. Default to RoCE-only [ 4.861186] qede 01:00.01: Storm FW 8.37.7.0, Management FW 8.52.9.0 [MBI 15.10.6] [eth1] [ 4.893311] [qed_rdma_set_pf_params:2076()]Current day drivers don't support RoCE & iWARP simultaneously on the same PF. Default to RoCE-only [ 5.181713] qede a1:00.00: Storm FW 8.37.7.0, Management FW 8.52.9.0 [MBI 15.10.6] [eth2] [ 5.224740] [qed_rdma_set_pf_params:2076()]Current day drivers don't support RoCE & iWARP simultaneously on the same PF. Default to RoCE-only [ 5.276449] qede a1:00.01: Storm FW 8.37.7.0, Management FW 8.52.9.0 [MBI 15.10.6] [eth3] [ 5.318671] [qed_rdma_set_pf_params:2076()]Current day drivers don't support RoCE & iWARP simultaneously on the same PF. Default to RoCE-only [ 5.369548] qede a1:00.02: Storm FW 8.37.7.0, Management FW 8.52.9.0 [MBI 15.10.6] [eth4] [ 5.411645] [qed_rdma_set_pf_params:2076()]Current day drivers don't support RoCE & iWARP simultaneously on the same PF. Default to RoCE-only Fixes: e0a8f9de16fc ("qed: Add iWARP enablement support") Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	netdevsim: fix unbalaced locking in nsim_create()	Taehee Yoo	1	-2/+2
	In the nsim_create(), rtnl_lock() is called before nsim_bpf_init(). If nsim_bpf_init() is failed, rtnl_unlock() should be called, but it isn't called. So, unbalanced locking would occur. Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: dsa: microchip: call phy_remove_link_mode during probe	Helmut Grohne	3	-23/+23
	When doing "ip link set dev ... up" for a ksz9477 backed link, ksz9477_phy_setup is called and it calls phy_remove_link_mode to remove 1000baseT HDX. During phy_remove_link_mode, phy_advertise_supported is called. Doing so reverts any previous change to advertised link modes e.g. using a udevd .link file. phy_remove_link_mode is not meant to be used while opening a link and should be called during phy probe when the link is not yet available to userspace. Therefore move the phy_remove_link_mode calls into ksz9477_switch_register. It indirectly calls dsa_register_switch, which creates the relevant struct phy_devices and we update the link modes right after that. At that time dev->features is already initialized by ksz9477_switch_detect. Remove phy_setup from ksz_dev_ops as no users remain. Link: https://lore.kernel.org/netdev/20200715192722.GD1256692@lunn.ch/ Fixes: 42fc6a4c613019 ("net: dsa: microchip: prepare PHY for proper advertisement") Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: hns3: fix return value error when query MAC link status fail	Jian Shen	2	-27/+25
	Currently, PF queries the MAC link status per second by calling function hclge_get_mac_link_status(). It return the error code when failed to send cmdq command to firmware. It's incorrect, because this return value is used as the MAC link status, which 0 means link down, and none-zero means link up. So fixes it. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: hns3: fix error handling for desc filling	Yunsheng Lin	2	-0/+9
	The content of the TX desc is automatically cleared by the HW when the HW has sent out the packet to the wire. When desc filling fails in hns3_nic_net_xmit(), it will call hns3_clear_desc() to do the error handling, which miss zeroing of the TX desc and the checking if a unmapping is needed. So add the zeroing and checking in hns3_clear_desc() to avoid the above problem. Also add DESC_TYPE_UNKNOWN to indicate the info in desc_cb is not valid, because hns3_nic_reclaim_desc() may treat the desc_cb->type of zero as packet and add to the sent pkt statistics accordingly. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: hns3: fix for not calculating TX BD send size correctly	Yunsheng Lin	2	-3/+1
	With GRO and fraglist support, the SKB can be aggregated to a total size of 65535, and when that SKB is forwarded through a bridge, the size of the SKB may be pushed to exceed the size of 65535 when br_dev_queue_push_xmit() is called. The max send size of BD supported by the HW is 65535, when a SKB with a headlen of over 65535 is sent to the driver, the driver needs to use multi BD to send the linear data, and the send size of the last BD is calculated incorrectly by the driver who is using '&' operation, which causes a TX error. Use '%' operation to fix this problem. Fixes: 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: hns3: fix for not unmapping TX buffer correctly	Yunsheng Lin	1	-11/+3
	When a big TX buffer is sent using multi BD, the driver maps the whole TX buffer, and unmaps it using info in desc_cb corresponding to each BD, but only the info in the desc_cb of first BD is correct, other info in desc_cb is wrong, which causes TX unmapping problem when SMMU is on. Only set the mapping and freeing info in the desc_cb of first BD to fix this problem, because the TX buffer only need to be unmapped and freed once. Fixes: 1e8a7977d09f("net: hns3: add handling for big TX fragment") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huzhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: udp: Fix wrong clean up for IS_UDPLITE macro	Miaohe Lin	2	-2/+2
	We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net-sysfs: add a newline when printing 'tx_timeout' by sysfs	Xiongfeng Wang	1	-1/+1
	When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	net: ethernet: ravb: exit if re-initialization fails in tx timeout	Yoshihiro Shimoda	1	-2/+24
	According to the report of [1], this driver is possible to cause the following error in ravb_tx_timeout_work(). ravb e6800000.ethernet ethernet: failed to switch device to config mode This error means that the hardware could not change the state from "Operation" to "Configuration" while some tx and/or rx queue are operating. After that, ravb_config() in ravb_dmac_init() will fail, and then any descriptors will be not allocaled anymore so that NULL pointer dereference happens after that on ravb_start_xmit(). To fix the issue, the ravb_tx_timeout_work() should check the return values of ravb_stop_dma() and ravb_dmac_init(). If ravb_stop_dma() fails, ravb_tx_timeout_work() re-enables TX and RX and just exits. If ravb_dmac_init() fails, just exits. [1] https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/ Reported-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	udp: Improve load balancing for SO_REUSEPORT.	Kuniyuki Iwashima	2	-12/+18
	Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index \| connected \| reciprocal_scale \| result --------------------------------------------- 0 \| no \| 20% \| 40% 1 \| no \| 20% \| 20% 2 \| yes \| 20% \| 0% 3 \| no \| 20% \| 40% 4 \| yes \| 20% \| 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21	udp: Copy has_conns in reuseport_grow().	Kuniyuki Iwashima	1	-0/+1
	If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	tipc: allow to build NACK message in link timeout function	Tung Nguyen	1	-1/+1
	Commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment. This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer. Fixes: 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX	Bixuan Cui	1	-1/+1
	Fix the warning: [-Werror=-Wframe-larger-than=] drivers/net/ethernet/neterion/vxge/vxge-main.c: In function'VXGE_COMPLETE_VPATH_TX.isra.37': drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: warning: the frame size of 1056 bytes is larger than 1024 bytes Dropping the NR_SKB_COMPLETED to 16 is appropriate that won't have much impact on performance and functionality. Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net: ag71xx: add missed clk_disable_unprepare in error path of probe	Huang Guobin	1	-1/+2
	The ag71xx_mdio_probe() forgets to call clk_disable_unprepare() when of_reset_control_get_exclusive() failed. Add the missed call to fix it. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net/sched: act_ct: fix restore the qdisc_skb_cb after defrag	wenxu	1	-2/+14
	The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	nfc: s3fwrn5: add missing release on skb in s3fwrn5_recv_frame	Navid Emamdoost	1	-0/+1
	The implementation of s3fwrn5_recv_frame() is supposed to consume skb on all execution paths. Release skb before returning -ENODEV. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	crypto/chtls: correct net_device reference count	Vinay Kumar Yadav	1	-1/+1
	ip_dev_find() call holds net_device reference which is not needed, use __ip_dev_find() which does not hold reference. v1->v2: - Correct submission tree. - Add fixes tag. Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	crypto/chtls: fix tls alert messages corrupted by tls data	Vinay Kumar Yadav	1	-3/+4
	When tls data skb is pending for Tx and tls alert comes , It is wrongly overwrite the record type of tls data to tls alert record type. fix the issue correcting it. v1->v2: - Correct submission tree. - Add fixes tag. Fixes: 6919a8264a32 ("Crypto/chtls: add/delete TLS header in driver") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	ionic: use mutex to protect queue operations	Shannon Nelson	3	-25/+17
	The ionic_wait_on_bit_lock() was a open-coded mutex knock-off used only for protecting the queue reset operations, and there was no reason not to use the real thing. We can use the lock more correctly and to better protect the queue stop and start operations from cross threading. We can also remove a useless and expensive bit operation from the Rx path. This fixes a case found where the link_status_check from a link flap could run into an MTU change and cause a crash. Fixes: beead698b173 ("ionic: Add the basic NDO callbacks for netdev support") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	ionic: keep rss hash after fw update	Shannon Nelson	1	-3/+2
	Make sure the RSS hash key is kept across a fw update by not de-initing it when an update is happening. Fixes: c672412f6172 ("ionic: remove lifs on fw reset") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	ionic: update filter id after replay	Shannon Nelson	1	-0/+24
	When we replay the rx filters after a fw-upgrade we get new filter_id values from the FW, which we need to save and update in our local filter list. This allows us to delete the filters with the correct filter_id when we're done. Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	ionic: fix up filter locks and debug msgs	Shannon Nelson	2	-10/+12
	Add in a couple of forgotten spinlocks and fix up some of the debug messages around filter management. Fixes: c1e329ebec8d ("ionic: Add management of rx filters") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	ionic: use offset for ethtool regs data	Shannon Nelson	1	-2/+5
	Use an offset to write the second half of the regs data into the second half of the buffer instead of overwriting the first half. Fixes: 4d03e00a2140 ("ionic: Add initial ethtool support") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net: hsr: check for return value of skb_put_padto()	Murali Karicheri	1	-6/+11
	skb_put_padto() can fail. So check for return type and return NULL for skb. Caller checks for skb and acts correctly if it is NULL. Fixes: 6d6148bc78d2 ("net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	Documentation: bareudp: update iproute2 sample commands	Guillaume Nault	1	-6/+13
	bareudp.rst was written before iproute2 gained support for this new type of tunnel. Therefore, the sample command lines didn't match the final iproute2 implementation. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	mlxsw: destroy workqueue when trap_register in mlxsw_emad_init	Liu Jian	1	-1/+2
	When mlxsw_core_trap_register fails in mlxsw_emad_init, destroy_workqueue() shouled be called to destroy mlxsw_core->emad_wq. Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	dpaa_eth: Fix one possible memleak in dpaa_eth_probe	Liu Jian	1	-1/+1
	When dma_coerce_mask_and_coherent() fails, the alloced netdev need to be freed. Fixes: 060ad66f9795 ("dpaa_eth: change DMA device") Signed-off-by: Liu Jian <liujian56@huawei.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net/smc: fix dmb buffer shortage	Karsten Graul	1	-2/+2
	There is a current limit of 1920 registered dmb buffers per ISM device for smc-d. One link group can contain 255 connections, each connection is using one dmb buffer. When the connection is closed then the registered buffer is held in a queue and is reused by the next connection. When a link group is 'full' then another link group is created and uses an own buffer pool. The link groups are added to a list using list_add() which puts a new link group to the first position in the list. In the situation that many connections are opened (>1920) and a few of them stay open while others are closed quickly we end up with at least 8 link groups. For a new connection a matching link group is looked up, iterating over the list of link groups. The trailing 7 link groups all have registered dmb buffers which could be reused, while the first link group has only a few dmb buffers and then hit the 1920 limit. Because the first link group is not full (255 connection limit not reached) it is chosen and finally the connection falls back to TCP because there is no dmb buffer available in this link group. There are multiple ways to fix that: using list_add_tail() allows to scan older link groups first for free buffers which ensures that buffers are reused first. This fixes the problem for smc-r link groups as well. For smc-d there is an even better way to address this problem because smc-d does not have the 255 connections per link group limit. So fix the problem for smc-d by allowing large link groups. Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net/smc: put slot when connection is killed	Karsten Graul	1	-1/+5
	To get a send slot smc_wr_tx_get_free_slot() is called, which might wait for a free slot. When smc_wr_tx_get_free_slot() returns there is a check if the connection was killed in the meantime. In that case don't only return an error, but also put back the free slot. Fixes: b290098092e4 ("net/smc: cancel send and receive for terminated socket") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA	David Howells	2	-2/+2
	rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read. Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read as this particular error doesn't get stored in ->sk_err by the networking core. Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive errors (there's no way for it to report which call, if any, the error was caused by). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20	net: bcmgenet: add missed clk_disable_unprepare in bcmgenet_probe	Zhang Changzhong	1	-2/+2
	The driver forgets to call clk_disable_unprepare() in error path after a success calling for clk_prepare_enable(). Fix to goto err_clk_disable if clk_prepare_enable() is successful. Fixes: c80d36ff63a5 ("net: bcmgenet: Use devm_clk_get_optional() to get the clocks") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>