linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-11-01	ibmvnic: don't stop queue in xmit	Sukadev Bhattiprolu	1	-2/+0
	If adapter's resetting bit is on, discard the packet but don't stop the transmit queue - instead leave that to the reset code. With this change, it is possible that we may get several calls to ibmvnic_xmit() that simply discard packets and return. But if we stop the queue here, we might end up doing so just after __ibmvnic_open() started the queues (during a hard/soft reset) and before the ->resetting bit was cleared. If that happens, there will be no one to restart queue and transmissions will be blocked indefinitely. This can cause a TIMEOUT reset and with auto priority failover enabled, an unnecessary FAILOVER reset to less favored backing device and then a FAILOVER back to the most favored backing device. If we hit the window repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER resets leaving the adapter unusable for extended periods of time. Fixes: 7f5b030830fe ("ibmvnic: Free skb's in cases of failure in transmit") Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	selftests: udp: test for passing SO_MARK as cmsg	Jakub Kicinski	4	-0/+131
	Before fix: \| Case IPv6 rejection returned 0, expected 1 \|FAIL - 1/4 cases failed With the fix: \| OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01	udp6: allow SO_MARK ctrl msg to affect routing	Jakub Kicinski	1	-1/+1
	Commit c6af0c227a22 ("ip: support SO_MARK cmsg") added propagation of SO_MARK from cmsg to skb->mark. For IPv4 and raw sockets the mark also affects route lookup, but in case of IPv6 the flow info is initialized before cmsg is parsed. Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg") Reported-and-tested-by: Xintong Hu <huxintong@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	cls_flower: Fix inability to match GRE/IPIP packets	Yoshiki Komachi	3	-1/+18
	When a packet of a new flow arrives in openvswitch kernel module, it dissects the packet and passes the extracted flow key to ovs-vswtichd daemon. If hw- offload configuration is enabled, the daemon creates a new TC flower entry to bypass openvswitch kernel module for the flow (TC flower can also offload flows to NICs but this time that does not matter). In this processing flow, I found the following issue in cases of GRE/IPIP packets. When ovs_flow_key_extract() in openvswitch module parses a packet of a new GRE (or IPIP) flow received on non-tunneling vports, it extracts information of the outer IP header for ip_proto/src_ip/dst_ip match keys. This means ovs-vswitchd creates a TC flower entry with IP protocol/addresses match keys whose values are those of the outer IP header. OTOH, TC flower, which uses flow_dissector (different parser from openvswitch module), extracts information of the inner IP header. The following flow is an example to describe the issue in more detail. <----------- Outer IP -----------------> <---------- Inner IP ----------> +----------+--------------+--------------+----------+----------+----------+ \| ip_proto \| src_ip \| dst_ip \| ip_proto \| src_ip \| dst_ip \| \| 47 (GRE) \| 192.168.10.1 \| 192.168.10.2 \| 6 (TCP) \| 10.0.0.1 \| 10.0.0.2 \| +----------+--------------+--------------+----------+----------+----------+ In this case, TC flower entry and extracted information are shown as below: - ovs-vswitchd creates TC flower entry with: - ip_proto: 47 - src_ip: 192.168.10.1 - dst_ip: 192.168.10.2 - TC flower extracts below for IP header matches: - ip_proto: 6 - src_ip: 10.0.0.1 - dst_ip: 10.0.0.2 Thus, GRE or IPIP packets never match the TC flower entry, as each dissector behaves differently. IMHO, the behavior of TC flower (flow dissector) does not look correct, as ip_proto/src_ip/dst_ip in TC flower match means the outermost IP header information except for GRE/IPIP cases. This patch adds a new flow_dissector flag FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP which skips dissection of the encapsulated inner GRE/IPIP header in TC flower classifier. Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	selftests: net: bridge: update IGMP/MLD membership interval value	Nikolay Aleksandrov	2	-6/+18
	When I fixed IGMPv3/MLDv2 to use the bridge's multicast_membership_interval value which is chosen by user-space instead of calculating it based on multicast_query_interval and multicast_query_response_interval I forgot to update the selftests relying on that behaviour. Now we have to manually set the expected GMI value to perform the tests correctly and get proper results (similar to IGMPv2 behaviour). Fixes: fac3cb82a54a ("net: bridge: mcast: use multicast_membership_interval for IGMPv3") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	net: bridge: fix uninitialized variables when BRIDGE_CFM is disabled	Ivan Vecera	1	-0/+2
	Function br_get_link_af_size_filtered() calls br_cfm_{,peer}_mep_count() that return a count. When BRIDGE_CFM is not enabled these functions simply return -EOPNOTSUPP but do not modify count parameter and calling function then works with uninitialized variables. Modify these inline functions to return zero in count parameter. Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.") Cc: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	net: phylink: avoid mvneta warning when setting pause parameters	Russell King (Oracle)	1	-1/+1
	mvneta does not support asymetric pause modes, and it flags this by the lack of AsymPause in the supported field. When setting pause modes, we check that pause->rx_pause == pause->tx_pause, but only when pause autoneg is enabled. When pause autoneg is disabled, we still allow pause->rx_pause != pause->tx_pause, which is incorrect when the MAC does not support asymetric pause, and causes mvneta to issue a warning. Fix this by removing the test for pause->autoneg, so we always check that pause->rx_pause == pause->tx_pause for network devices that do not support AsymPause. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	nfp: fix potential deadlock when canceling dim work	Yinjun Zhang	1	-4/+0
	When port is linked down, the process which has acquired rtnl_lock will wait for the in-progress dim work to finish, and the work also acquires rtnl_lock, which may cause deadlock. Currently IRQ_MOD registers can be configured by `ethtool -C` and dim work, and which will take effect depends on the execution order, rtnl_lock is useless here, so remove them. Fixes: 9d32e4e7e9e1 ("nfp: add support for coalesce adaptive feature") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	nfp: fix NULL pointer access when scheduling dim work	Yinjun Zhang	1	-2/+2
	Each rx/tx ring has a related dim work, when rx/tx ring number is decreased by `ethtool -L`, the corresponding rx_ring or tx_ring is assigned NULL, while its related work is not destroyed. When scheduled, the work will access NULL pointer. Fixes: 9d32e4e7e9e1 ("nfp: add support for coalesce adaptive feature") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	selftests/net: update .gitignore with newly added tests	Shuah Khan	1	-0/+4
	Update .gitignore with newly added tests: tools/testing/selftests/net/af_unix/test_unix_oob tools/testing/selftests/net/gro tools/testing/selftests/net/ioam6_parser tools/testing/selftests/net/toeplitz Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	net: amd-xgbe: Toggle PLL settings during rate change	Shyam Sundar S K	2	-1/+27
	For each rate change command submission, the FW has to do a phy power off sequence internally. For this to happen correctly, the PLL re-initialization control setting has to be turned off before sending mailbox commands and re-enabled once the command submission is complete. Without the PLL control setting, the link up takes longer time in a fixed phy configuration. Fixes: 47f164deab22 ("amd-xgbe: Add PCI device support") Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	sctp: return true only for pathmtu update in sctp_transport_pl_toobig	Xin Long	1	-3/+4
	sctp_transport_pl_toobig() supposes to return true only if there's pathmtu update, so that in sctp_icmp_frag_needed() it would call sctp_assoc_sync_pmtu() and sctp_retransmit(). This patch is to fix these return places in sctp_transport_pl_toobig(). Fixes: 836964083177 ("sctp: do state transition when receiving an icmp TOOBIG packet") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	sctp: subtract sctphdr len in sctp_transport_pl_hlen	Xin Long	1	-1/+2
	sctp_transport_pl_hlen() is called to calculate the outer header length for PL. However, as the Figure in rfc8899#section-4.4: Any additional headers .--- MPS -----. \| \| \| v v v +------------------------------+ \| IP \| ** \| PL \| protocol data \| +------------------------------+ <----- PLPMTU -----> <---------- PMTU --------------> Outer header are IP + Any additional headers, which doesn't include Packetization Layer itself header, namely sctphdr, whereas sctphdr is counted by __sctp_mtu_payload(). The incorrect calculation caused the link pathmtu to be set larger than expected by t->pl.pmtu + sctp_transport_pl_hlen(). This patch is to fix it by subtracting sctphdr len in sctp_transport_pl_hlen(). Fixes: d9e2e410ae30 ("sctp: add the constants/variables and states and some APIs for transport") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	sctp: reset probe_timer in sctp_transport_pl_update	Xin Long	1	-3/+1
	sctp_transport_pl_update() is called when transport update its dst and pathmtu, instead of stopping the PLPMTUD probe timer, PLPMTUD should start over and reset the probe timer. Otherwise, the PLPMTUD service would stop. Fixes: 92548ec2f1f9 ("sctp: add the probe timer in transport for PLPMTUD") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29	sctp: allow IP fragmentation when PLPMTUD enters Error state	Xin Long	2	-7/+10
	Currently when PLPMTUD enters Error state, transport pathmtu will be set to MIN_PLPMTU(512) while probe is continuing with BASE_PLPMTU(1200). It will cause pathmtu to stay in a very small value, even if the real pmtu is some value like 1000. RFC8899 doesn't clearly say how to set the value in Error state. But one possibility could be keep using BASE_PLPMTU for the real pmtu, but allow to do IP fragmentation when it's in Error state. As it says in rfc8899#section-5.4: Some paths could be unable to sustain packets of the BASE_PLPMTU size. The Error State could be implemented to provide robustness to such paths. This allows fallback to a smaller than desired PLPMTU rather than suffer connectivity failure. This could utilize methods such as endpoint IP fragmentation to enable the PL sender to communicate using packets smaller than the BASE_PLPMTU. This patch is to set pmtu to BASE_PLPMTU instead of MIN_PLPMTU for Error state in sctp_transport_pl_send/toobig(), and set packet ipfragok for non-probe packets when it's in Error state. Fixes: 1dc68c194571 ("sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	Revert "net: hns3: fix pause config problem after autoneg disabled"	Guangbin Huang	5	-57/+10
	This reverts commit 3bda2e5df476417b6d08967e2d84234a59d57b1c. According to discussion with Andrew as follow: https://lore.kernel.org/netdev/09eda9fe-196b-006b-6f01-f54e75715961@huawei.com/ HNS3 driver needs to separate pause autoneg from general autoneg, so revert this incorrect patch. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Link: https://lore.kernel.org/r/20211028140624.53149-1-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28	mptcp: fix corrupt receiver key in MPC + data + checksum	Davide Caratti	2	-15/+28
	using packetdrill it's possible to observe that the receiver key contains random values when clients transmit MP_CAPABLE with data and checksum (as specified in RFC8684 §3.1). Fix the layout of mptcp_out_options, to avoid using the skb extension copy when writing the MP_CAPABLE sub-option. Fixes: d7b269083786 ("mptcp: shrink mptcp_out_options struct") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/233 Reported-by: Poorva Sonparote <psonparo@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20211027203855.264600-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28	riscv, bpf: Fix potential NULL dereference	Björn Töpel	1	-1/+2
	The bpf_jit_binary_free() function requires a non-NULL argument. When the RISC-V BPF JIT fails to converge in NR_JIT_ITERATIONS steps, jit_data->header will be NULL, which triggers a NULL dereference. Avoid this by checking the argument, prior calling the function. Fixes: ca6cb5447cec ("riscv, bpf: Factor common RISC-V JIT code") Signed-off-by: Björn Töpel <bjorn@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20211028125115.514587-1-bjorn@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28	octeontx2-af: Fix possible null pointer dereference.	Rakesh Babu Saladi	2	-1/+4
	This patch fixes possible null pointer dereference in files "rvu_debugfs.c" and "rvu_nix.c" Fixes: 8756828a8148 ("octeontx2-af: Add NPA aura and pool contexts to debugfs") Fixes: 9a946def264d ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries") Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	octeontx2-af: Display all enabled PF VF rsrc_alloc entries.	Rakesh Babu	1	-32/+106
	Currently, we are using a fixed buffer size of length 2048 to display rsrc_alloc output. As a result a maximum of 2048 characters of rsrc_alloc output is displayed, which may lead sometimes to display only partial output. This patch fixes this dependency on max limit of buffer size and displays all PF VF entries. Each column of the debugfs entry "rsrc_alloc" uses a fixed width of 12 characters to print the list of LFs of each block for a PF/VF. If the length of list of LFs of a block exceeds this fixed width then the list gets truncated and displays only a part of the list. This patch fixes this by using the maximum possible length of list of LFs among all blocks of all PFs and VFs entries as the width size. Fixes: f7884097141b ("octeontx2-af: Formatting debugfs entry rsrc_alloc.") Fixes: 23205e6d06d4 ("octeontx2-af: Dump current resource provisioning status") Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	octeontx2-af: Check whether ipolicers exists	Subbaraya Sundeep	1	-0/+8
	While displaying ingress policers information in debugfs check whether ingress policers exist in the hardware or not because some platforms(CN9XXX) do not have this feature. Fixes: e7d8971763f3 ("octeontx2-af: cn10k: Debugfs support for bandwidth") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	net: ethernet: microchip: lan743x: Fix skb allocation failure	Yuiko Oshino	1	-4/+9
	The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances. GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	net/tls: Fix flipped sign in async_wait.err assignment	Daniel Jordan	1	-1/+1
	sk->sk_err contains a positive number, yet async_wait.err wants the opposite. Fix the missed sign flip, which Jakub caught by inspection. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	net/tls: Fix flipped sign in tls_err_abort() calls	Daniel Jordan	2	-11/+15
	sk->sk_err appears to expect a positive value, a convention that ktls doesn't always follow and that leads to memory corruption in other code. For instance, [kworker] tls_encrypt_done(..., err=<negative error from crypto request>) tls_err_abort(.., err) sk->sk_err = err; [task] splice_from_pipe_feed ... tls_sw_do_sendpage if (sk->sk_err) { ret = -sk->sk_err; // ret is positive splice_from_pipe_feed (continued) ret = actor(...) // ret is still positive and interpreted as bytes // written, resulting in underflow of buf->len and // sd->len, leading to huge buf->offset and bogus // addresses computed in later calls to actor() Fix all tls_err_abort() callers to pass a negative error code consistently and centralize the error-prone sign flip there, throwing in a warning to catch future misuse and uninlining the function so it really does only warn once. Cc: stable@vger.kernel.org Fixes: c46234ebb4d1e ("tls: RX path for ktls") Reported-by: syzbot+b187b77c8474f9648fae@syzkaller.appspotmail.com Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	net/smc: Correct spelling mistake to TCPF_SYN_RECV	Wen Gu	1	-1/+1
	There should use TCPF_SYN_RECV instead of TCP_SYN_RECV. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	net/smc: Fix smc_link->llc_testlink_time overflow	Tony Lu	1	-1/+1
	The value of llc_testlink_time is set to the value stored in net->ipv4.sysctl_tcp_keepalive_time when linkgroup init. The value of sysctl_tcp_keepalive_time is already jiffies, so we don't need to multiply by HZ, which would cause smc_link->llc_testlink_time overflow, and test_link send flood. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	nfp: bpf: relax prog rejection for mtu check through max_pkt_offset	Yu Xiao	3	-9/+26
	MTU change is refused whenever the value of new MTU is bigger than the max packet bytes that fits in NFP Cluster Target Memory (CTM). However, an eBPF program doesn't always need to access the whole packet data. The maximum direct packet access (DPA) offset has always been caculated by verifier and stored in the max_pkt_offset field of prog aux data. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Niklas Soderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28	vmxnet3: do not stop tx queues after netif_device_detach()	Dongli Zhang	1	-1/+0
	The netif_device_detach() conditionally stops all tx queues if the queues are running. There is no need to call netif_tx_stop_all_queues() again. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	tracing: Do not warn when connecting eprobe to non existing event	Steven Rostedt (VMware)	1	-2/+2
	When the syscall trace points are not configured in, the kselftests for ftrace will try to attach an event probe (eprobe) to one of the system call trace points. This triggered a WARNING, because the failure only expects to see memory issues. But this is not the only failure. The user may attempt to attach to a non existent event, and the kernel must not warn about it. Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27	r8169: Add device 10ec:8162 to driver r8169	Janghyub Seo	1	-0/+1
	This patch makes the driver r8169 pick up device Realtek Semiconductor Co. , Ltd. Device [10ec:8162]. Signed-off-by: Janghyub Seo <jhyub06@gmail.com> Suggested-by: Rushab Shah <rushabshah32@gmail.com> Link: https://lore.kernel.org/r/1635231849296.1489250046.441294000@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27	ptp: Document the PTP_CLK_MAGIC ioctl number	Randy Dunlap	1	-0/+1
	Add PTP_CLK_MAGIC to the userspace-api/ioctl/ioctl-number.rst documentation file. Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20211024163831.10200-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27	virtio-ring: fix DMA metadata flags	Vincent Whitchurch	1	-1/+1
	The flags are currently overwritten, leading to the wrong direction being passed to the DMA unmap functions. Fixes: 72b5e8958738aaa4 ("virtio-ring: store DMA metadata in desc_extra for split virtqueue") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20211026133100.17541-1-vincent.whitchurch@axis.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-10-27	usbnet: fix error return code in usbnet_probe()	Wang Hai	1	-0/+1
	Return error code if usb_maxpacket() returns 0 in usbnet_probe() Fixes: 397430b50a36 ("usbnet: sanity check for maxpacket") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211026124015.3025136-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27	ftrace/nds32: Update the proto for ftrace_trace_function to match ftrace_stub	Steven Rostedt (VMware)	1	-1/+1
	The ftrace callback prototype was changed to pass a special ftrace_regs instead of pt_regs as the last parameter, but the static ftrace for nds32 missed updating ftrace_trace_function and this caused a warning when compared to ftrace_stub: ../arch/nds32/kernel/ftrace.c: In function '_mcount': ../arch/nds32/kernel/ftrace.c:24:35: error: comparison of distinct pointer types lacks a cast [-Werror] 24 \| if (ftrace_trace_function != ftrace_stub) \| ^~ Link: https://lore.kernel.org/all/20211027055554.19372-1-rdunlap@infradead.org/ Link: https://lkml.kernel.org/r/20211027125101.33449969@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Fixes: d19ad0775dcd6 ("ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27	nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TEST	Guenter Roeck	1	-0/+1
	nios2:allmodconfig builds fail with make[1]: *** No rule to make target 'arch/nios2/boot/dts/""', needed by 'arch/nios2/boot/dts/built-in.a'. Stop. make: [Makefile:1868: arch/nios2/boot/dts] Error 2 (ignored) This is seen with compile tests since those enable NIOS2_DTB_SOURCE_BOOL, which in turn enables NIOS2_DTB_SOURCE. This causes the build error because the default value for NIOS2_DTB_SOURCE is an empty string. Disable NIOS2_DTB_SOURCE_BOOL for compile tests to avoid the error. Fixes: 2fc8483fdcde ("nios2: Build infrastructure") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2021-10-27	net: hns3: adjust string spaces of some parameters of tx bd info in debugfs	Guangbin Huang	1	-3/+3
	This patch adjusts the string spaces of some parameters of tx bd info in debugfs according to their maximum needs. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: expand buffer len for some debugfs command	Guangbin Huang	1	-3/+3
	The specified buffer length for three debugfs files fd_tcam, uc and tqp is not enough for their maximum needs, so this patch fixes them. Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs") Fixes: 1556ea9120ff ("net: hns3: refactor dump mac list of debugfs") Fixes: d96b0e59468d ("net: hns3: refactor dump reg of debugfs") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: add more string spaces for dumping packets number of queue info in debugfs	Jie Wang	1	-2/+2
	As the width of packets number registers is 32 bits, they needs at most 10 characters for decimal data printing, but now the string spaces is not enough, so this patch fixes it. Fixes: e44c495d95e ("net: hns3: refactor queue info of debugfs") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: fix data endian problem of some functions of debugfs	Jie Wang	1	-16/+14
	The member data in struct hclge_desc is type of __le32, it needs endian conversion before using it, and some functions of debugfs didn't do that, so this patch fixes it. Fixes: c0ebebb9ccc1 ("net: hns3: Add "dcb register" status information query function") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: ignore reset event before initialization process is done	Guangbin Huang	3	-0/+5
	Currently, if there is a reset event triggered by RAS during device in initialization process, driver may run reset process concurrently with initialization process. In this case, it may cause problem. For example, the RSS indirection table may has not been alloc memory in initialization process yet, but it is used in reset process, it will cause a call trace like this: [61228.744836] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [61228.897677] Workqueue: hclgevf hclgevf_service_task [hclgevf] [61228.911390] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [61228.918670] pc : hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61228.927812] lr : hclgevf_set_rss_indir_table+0x90/0x190 [hclgevf] [61228.937248] sp : ffff8000162ebb50 [61228.941087] x29: ffff8000162ebb50 x28: ffffb77add72dbc0 x27: ffff0820c7dc8080 [61228.949516] x26: 0000000000000000 x25: ffff0820ad4fc880 x24: ffff0820c7dc8080 [61228.958220] x23: ffff0820c7dc8090 x22: 00000000ffffffff x21: 0000000000000040 [61228.966360] x20: ffffb77add72b9c0 x19: 0000000000000000 x18: 0000000000000030 [61228.974646] x17: 0000000000000000 x16: ffffb77ae713feb0 x15: ffff0820ad4fcce8 [61228.982808] x14: ffffffffffffffff x13: ffff8000962eb7f7 x12: 00003834ec70c960 [61228.991990] x11: 00e0fafa8c206982 x10: 9670facc78a8f9a8 x9 : ffffb77add717530 [61229.001123] x8 : ffff0820ad4fd6b8 x7 : 0000000000000000 x6 : 0000000000000011 [61229.010249] x5 : 00000000000cb1b0 x4 : 0000000000002adb x3 : 0000000000000049 [61229.018662] x2 : ffff8000162ebbb8 x1 : 0000000000000000 x0 : 0000000000000480 [61229.027002] Call trace: [61229.030177] hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61229.039009] hclgevf_rss_init_hw+0x128/0x1b4 [hclgevf] [61229.046809] hclgevf_reset_rebuild+0x17c/0x69c [hclgevf] [61229.053862] hclgevf_reset_service_task+0x4cc/0xa80 [hclgevf] [61229.061306] hclgevf_service_task+0x6c/0x630 [hclgevf] [61229.068491] process_one_work+0x1dc/0x48c [61229.074121] worker_thread+0x15c/0x464 [61229.078562] kthread+0x168/0x16c [61229.082873] ret_from_fork+0x10/0x18 [61229.088221] Code: 7900e7f6 f904a683 d503201f 9101a3e2 (38616b43) [61229.095357] ---[ end trace 153661a538f6768c ]--- To fix this problem, don't schedule reset task before initialization process is done. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: change hclge/hclgevf workqueue to WQ_UNBOUND mode	Yufeng Mo	3	-31/+6
	Currently, the workqueue of hclge/hclgevf is executed on the CPU that initiates scheduling requests by default. In stress scenarios, the CPU may be busy and workqueue scheduling is completed after a long period of time. To avoid this situation and implement proper scheduling, use the WQ_UNBOUND mode instead. In this way, the workqueue can be performed on a relatively idle CPU. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27	net: hns3: fix pause config problem after autoneg disabled	Guangbin Huang	5	-10/+57
	If a TP port is configured by follow steps: 1.ethtool -s ethx autoneg off speed 100 duplex full 2.ethtool -A ethx rx on tx on 3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off) 4.ethtool -s ethx autoneg off speed 100 duplex full In step 3, driver will set rx&tx pause parameters of hardware to off as pause parameters negotiated with link partner are off. After step 4, the "ethtool -a ethx" command shows both rx and tx pause parameters are on. However, pause parameters of hardware are still off and port has no flow control function actually. To fix this problem, if autoneg is disabled, driver uses its saved parameters to restore pause of hardware. If the speed is not changed in this case, there is no link state changed for phy, it will cause the pause parameter is not taken effect, so we need to force phy to go down and up. Fixes: aacbe27e82f0 ("net: hns3: modify how pause options is displayed") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26	bpf: Fix potential race in tail call compatibility check	Toke Høiland-Jørgensen	4	-11/+23
	Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) v3: - Put lock and the members it protects into an embedded 'owner' struct (Daniel) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
2021-10-26	bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET	Tejun Heo	1	-4/+4
	bpf_types.h has BPF_MAP_TYPE_INODE_STORAGE and BPF_MAP_TYPE_TASK_STORAGE declared inside #ifdef CONFIG_NET although they are built regardless of CONFIG_NET. So, when CONFIG_BPF_SYSCALL && !CONFIG_NET, they are built without the declarations leading to spurious build failures and not registered to bpf_map_types making them unavailable. Fix it by moving the BPF_MAP_TYPE for the two map types outside of CONFIG_NET. Reported-by: kernel test robot <lkp@intel.com> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/YXG1cuuSJDqHQfRY@slm.duckdns.org
2021-10-26	selftests/bpf: Use recv_timeout() instead of retries	Yucong Sun	1	-55/+20
	We use non-blocking sockets in those tests, retrying for EAGAIN is ugly because there is no upper bound for the packet arrival time, at least in theory. After we fix poll() on sockmap sockets, now we can switch to select()+recv(). Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-5-xiyou.wangcong@gmail.com
2021-10-26	net: Implement ->sock_is_readable() for UDP and AF_UNIX	Cong Wang	4	-0/+10
	Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll any psock queues in ->poll(), except for TCP. With ->sock_is_readable() now we can overwrite >sock_is_readable(), invoke and implement it for both UDP and AF_UNIX sockets. Reported-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
2021-10-26	skmsg: Extract and reuse sk_msg_is_readable()	Cong Wang	3	-14/+16
	tcp_bpf_sock_is_readable() is pretty much generic, we can extract it and reuse it for non-TCP sockets. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-3-xiyou.wangcong@gmail.com
2021-10-26	net: Rename ->stream_memory_read to ->sock_is_readable	Cong Wang	6	-11/+14
	The proto ops ->stream_memory_read() is currently only used by TCP to check whether psock queue is empty or not. We need to rename it before reusing it for non-TCP protocols, and adjust the exsiting users accordingly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
2021-10-26	tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function	Liu Jian	1	-0/+12
	With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or multiple cores) on a single socket 'sk' we could get the following flow. msgA, sk msgB, sk ----------- --------------- tcp_bpf_sendmsg() lock(sk) psock = sk->psock tcp_bpf_sendmsg() lock(sk) ... blocking tcp_bpf_send_verdict if (psock->eval == NONE) psock->eval = sk_psock_msg_verdict .. < handle SK_REDIRECT case > release_sock(sk) < lock dropped so grab here > ret = tcp_bpf_sendmsg_redir psock = sk->psock tcp_bpf_send_verdict lock_sock(sk) ... blocking on B if (psock->eval == NONE) <- boom. psock->eval will have msgA state The problem here is we dropped the lock on msgA and grabbed it with msgB. Now we have old state in psock and importantly psock->eval has not been cleared. So msgB will run whatever action was done on A and the verdict program may never see it. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
2021-10-26	watchdog: Fix OMAP watchdog early handling	Walter Stoll	1	-1/+5
	TI's implementation does not service the watchdog even if the kernel command line parameter omap_wdt.early_enable is set to 1. This patch fixes the issue. Signed-off-by: Walter Stoll <walter.stoll@duagon.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/88a8fe5229cd68fa0f1fd22f5d66666c1b7057a0.camel@duagon.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>