linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-10-27	net: enetc: survive memory pressure without crashing	Vladimir Oltean	1	-0/+5
	Under memory pressure, enetc_refill_rx_ring() may fail, and when called during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not checked for. An extreme case of memory pressure will result in exactly zero buffers being allocated for the RX ring, and in such a case it is expected that hardware drops all RX packets due to lack of buffers. This does not happen, because the reset-default value of the consumer and produces index is 0, and this makes the ENETC think that all buffers have been initialized and that it owns them (when in reality none were). The hardware guide explains this best: \| Configure the receive ring producer index register RBaPIR with a value \| of 0. The producer index is initially configured by software but owned \| by hardware after the ring has been enabled. Hardware increments the \| index when a frame is received which may consume one or more BDs. \| Hardware is not allowed to increment the producer index to match the \| consumer index since it is used to indicate an empty condition. The ring \| can hold at most RBLENR[LENGTH]-1 received BDs. \| \| Configure the receive ring consumer index register RBaCIR. The \| consumer index is owned by software and updated during operation of the \| of the BD ring by software, to indicate that any receive data occupied \| in the BD has been processed and it has been prepared for new data. \| - If consumer index and producer index are initialized to the same \| value, it indicates that all BDs in the ring have been prepared and \| hardware owns all of the entries. \| - If consumer index is initialized to producer index plus N, it would \| indicate N BDs have been prepared. Note that hardware cannot start if \| only a single buffer is prepared due to the restrictions described in \| (2). \| - Software may write consumer index to match producer index anytime \| while the ring is operational to indicate all received BDs prior have \| been processed and new BDs prepared for hardware. Normally, the value of rx_ring->rcir (consumer index) is brought in sync with the rx_ring->next_to_use software index, but this only happens if page allocation ever succeeded. When PI==CI==0, the hardware appears to receive frames and write them to DMA address 0x0 (?!), then set the READY bit in the BD. The enetc_clean_rx_ring() function (and its XDP derivative) is naturally not prepared to handle such a condition. It will attempt to process those frames using the rx_swbd structure associated with index i of the RX ring, but that structure is not fully initialized (enetc_new_page() does all of that). So what happens next is undefined behavior. To operate using no buffer, we must initialize the CI to PI + 1, which will block the hardware from advancing the CI any further, and drop everything. The issue was seen while adding support for zero-copy AF_XDP sockets, where buffer memory comes from user space, which can even decide to supply no buffers at all (example: "xdpsock --txonly"). However, the bug is present also with the network stack code, even though it would take a very determined person to trigger a page allocation failure at the perfect time (a series of ifup/ifdown under memory pressure should eventually reproduce it given enough retries). Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29	net: enetc: cache accesses to &priv->si->hw	Vladimir Oltean	1	-11/+17
	The &priv->si->hw construct dereferences 2 pointers and makes lines longer than they need to be, in turn making the code harder to read. Replace &priv->si->hw accesses with a "hw" variable when there are 2 or more accesses within a function that dereference this. This includes loops, since &priv->si->hw is a loop invariant. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28	net: drop the weight argument from netif_napi_add	Jakub Kicinski	1	-2/+1
	We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: enetc: deny offload of tc-based TSN features on VF interfaces	Vladimir Oltean	1	-20/+1
	TSN features on the ENETC (taprio, cbs, gate, police) are configured through a mix of command BD ring messages and port registers: enetc_port_rd(), enetc_port_wr(). Port registers are a region of the ENETC memory map which are only accessible from the PCIe Physical Function. They are not accessible from the Virtual Functions. Moreover, attempting to access these registers crashes the kernel: $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001 fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15 fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002) fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0 $ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \ sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2 Unable to handle kernel paging request at virtual address ffff800009551a08 Internal error: Oops: 96000007 [#1] PREEMPT SMP pc : enetc_setup_tc_taprio+0x170/0x47c lr : enetc_setup_tc_taprio+0x16c/0x47c Call trace: enetc_setup_tc_taprio+0x170/0x47c enetc_setup_tc+0x38/0x2dc taprio_change+0x43c/0x970 taprio_init+0x188/0x1e0 qdisc_create+0x114/0x470 tc_modify_qdisc+0x1fc/0x6c0 rtnetlink_rcv_msg+0x12c/0x390 Split enetc_setup_tc() into separate functions for the PF and for the VF drivers. Also remove enetc_qos.o from being included into enetc-vf.ko, since it serves absolutely no purpose there. Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: enetc: move enetc_set_psfp() out of the common enetc_set_features()	Vladimir Oltean	1	-31/+1
	The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC flag; only PFs should. Moreover, TSN-specific code should go to enetc_qos.c, which should not be included in the VF driver. Fixes: 79e499829f3f ("net: enetc: add hw tc hw offload features for PSPF capability") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	net: enetc: count the tc-taprio window drops	Po Liu	1	-2/+11
	The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet which will never fit in its allotted time slot for its traffic class: either block the entire port due to head-of-line blocking, or drop the packet and set a bit in the writeback format of the transmit buffer descriptor, allowing other packets to be sent. We obviously choose the second option in the driver, but we do not detect the drop condition, so from the perspective of the network stack, the packet is sent and no error counter is incremented. This change checks the writeback of the TX BD when tc-taprio is enabled, and increments a specific ethtool statistics counter and a generic "tx_dropped" counter in ndo_get_stats64. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09	net: enetc: Remove useless DMA-32 fallback configuration	Christophe JAILLET	1	-6/+2
	As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32 bits case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lkml.org/lkml/2021/6/7/398 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/dbecd4eb49a9586ee343b5473dda4b84c42112e9.1641742884.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-13	bpf: Let bpf_warn_invalid_xdp_action() report more info	Paolo Abeni	1	-1/+1
	In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-10-21	net: enetc: use the skb variable directly in enetc_clean_tx_ring()	Vladimir Oltean	1	-2/+1
	The code checks whether the skb had one-step TX timestamping enabled, in order to schedule the work item for emptying the priv->tx_skbs queue. That code checks for "tx_swbd->skb" directly, when we already had a skb retrieved using enetc_tx_swbd_get_skb(tx_swbd) - a TX software BD can also hold an XDP_TX packet or an XDP frame. But since the direct tx_swbd dereference is in an "if" block guarded by the non-NULL quality of "skb", accessing "tx_swbd->skb" directly is not wrong, just confusing. Just use the local variable named "skb". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-21	net: enetc: remove local "priv" variable in enetc_clean_tx_ring()	Vladimir Oltean	1	-4/+1
	The "priv" variable is needed in the "check_writeback" scope since commit d39823121911 ("enetc: add hardware timestamping support"). Since commit 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping"), we also need "priv" in the larger function scope. So the local variable from the "if" block scope is not needed, and actually shadows the other one. Delete it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13	net: enetc: fix check for allocation failure	Dan Carpenter	1	-1/+3
	This was supposed to be a check for if dma_alloc_coherent() failed but it has a copy and paste bug so it will not work. Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20211013080456.GC6010@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13	net: enetc: include ip6_checksum.h for csum_ipv6_magic	Ioana Ciornei	1	-0/+1
	For those architectures which do not define_HAVE_ARCH_IPV6_CSUM, we need to include ip6_checksum.h which provides the csum_ipv6_magic() function. Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211012121358.16641-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08	net: enetc: add support for software TSO	Ioana Ciornei	1	-20/+299
	This patch adds support for driver level TSO in the enetc driver using the TSO API. Beside using the usual tso_build_hdr(), tso_build_data() this specific implementation also has to compute the checksum, both IP and L4, for each resulted segment. This is because the ENETC controller does not support Tx checksum offload which is needed in order to take advantage of TSO. With the workaround for the ENETC MDIO erratum in place the Tx path of the driver is forced to lock/unlock for each skb sent. This is why, even though we are computing the checksum by hand we see the following improvement in TCP termination on the LS1028A SoC, on a single A72 core running at 1.3GHz: before: 1.63 Gbits/sec after: 2.34 Gbits/sec Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08	net: enetc: declare NETIF_F_HW_CSUM and do it in software	Ioana Ciornei	1	-1/+7
	This is just a preparation patch for software TSO in the enetc driver. Unfortunately, ENETC does not support Tx checksum offload which would normally render TSO, even software, impossible. Declare NETIF_F_HW_CSUM as part of the feature set and do it at driver level using skb_csum_hwoffload_help() so that we can move forward and also add support for TSO in the next patch. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-5/+2
	net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-19	enetc: Fix uninitialized struct dim_sample field usage	Claudiu Manoil	1	-1/+1
	The only struct dim_sample member that does not get initialized by dim_update_sample() is comp_ctr. (There is special API to initialize comp_ctr: dim_update_sample_with_comps(), and it is currently used only for RDMA.) comp_ctr is used to compute curr_stats->cmps and curr_stats->cpe_ratio (see dim_calc_stats()) which in turn are consumed by the rdma_dim_() API. Therefore, functionally, the net_dim() API consumers are not affected. Nevertheless, fix the computation of statistics based on an uninitialized variable, even if the mentioned statistics are not used at the moment. Fixes: ae0e6a5d1627 ("enetc: Add adaptive interrupt coalescing") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19	enetc: Fix illegal access when reading affinity_hint	Claudiu Manoil	1	-4/+1
	irq_set_affinity_hit() stores a reference to the cpumask_t parameter in the irq descriptor, and that reference can be accessed later from irq_affinity_hint_proc_show(). Since the cpu_mask parameter passed to irq_set_affinity_hit() has only temporary storage (it's on the stack memory), later accesses to it are illegal. Thus reads from the corresponding procfs affinity_hint file can result in paging request oops. The issue is fixed by the get_cpu_mask() helper, which provides a permanent storage for the cpumask_t parameter. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16	net: enetc: Make use of the helper function dev_err_probe()	Cai Huoqing	1	-4/+2
	When possible use dev_err_probe help to properly deal with the PROBE_DEFER error, the benefit is that DEFER issue will be logged in the devices_deferred debugfs file. And using dev_err_probe() can reduce code size, and simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23	enetc: fix locking for one-step timestamping packet transfer	Yangbo Lu	1	-9/+9
	The previous patch to support PTP Sync packet one-step timestamping described one-step timestamping packet handling logic as below in commit message: - Trasmit packet immediately if no other one in transfer, or queue to skb queue if there is already one in transfer. The test_and_set_bit_lock() is used here to lock and check state. - Start a work when complete transfer on hardware, to release the bit lock and to send one skb in skb queue if has. There was not problem of the description, but there was a mistake in implementation. The locking/test_and_set_bit_lock() should be put in enetc_start_xmit() which may be called by worker, rather than in enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after bit lock released is not able to lock again for transfer. Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: apply the MDIO workaround for XDP_REDIRECT too	Vladimir Oltean	1	-0/+4
	Described in fd5736bf9f23 ("enetc: Workaround for MDIO register access issue") is a workaround for a hardware bug that requires a register access of the MDIO controller to never happen concurrently with a register access of a port PF. To avoid that, a mutual exclusion scheme with rwlocks was implemented - the port PF accessors are the 'read' side, and the MDIO accessors are the 'write' side. When we do XDP_REDIRECT between two ENETC interfaces, all is fine because the MDIO lock is already taken from the NAPI poll loop. But when the ingress interface is not ENETC, just the egress is, the MDIO lock is not taken, so we might access the port PF registers concurrently with MDIO, which will make the link flap due to wrong values returned from the PHY. To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock is designed as a rwlock is important here, because the read side is reentrant (that is one of the main reasons why we chose it). Usually, the way we benefit of its reentrancy is by running the data path concurrently on both CPUs, but in this case, we benefit from the reentrancy by taking the lock even when the lock is already taken (and that's the situation where ENETC is both the ingress and the egress interface for XDP_REDIRECT, which was fine before and still is fine now). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: fix buffer leaks with XDP_TX enqueue rejections	Vladimir Oltean	1	-4/+12
	If the TX ring is congested, enetc_xdp_tx() returns false for the current XDP frame (represented as an array of software BDs). This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd from software BDs freshly cleaned from the RX ring. The issue is that we scrub the RX software BDs too soon, more precisely before we know that we can enqueue the TX BDs successfully into the TX ring. If we can't enqueue them (and enetc_xdp_tx returns false), we call enetc_xdp_drop which attempts to recycle the buffers held by the RX software BDs. But because we scrubbed those RX BDs already, two things happen: (a) we leak their memory (b) we populate the RX software BD ring with an all-zero rx_swbd structure, which makes the buffer refill path allocate more memory. enetc_refill_rx_ring -> if (unlikely(!rx_swbd->page)) -> enetc_new_page That is a recipe for fast OOM. Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: handle the invalid XDP action the same way as XDP_DROP	Vladimir Oltean	1	-4/+3
	When the XDP program returns an invalid action, we should free the RX buffer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: use dedicated TX rings for XDP	Vladimir Oltean	1	-7/+39
	It is possible for one CPU to perform TX hashing (see netdev_pick_tx) between the 8 ENETC TX rings, and the TX hashing to select TX queue 1. At the same time, it is possible for the other CPU to already use TX ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual exclusion between XDP and the network stack, we run into an issue because the ENETC TX procedure is not reentrant. The obvious approach would be to just make XDP take the lock of the network stack's TX queue corresponding to the ring it's about to enqueue in. For XDP_REDIRECT, this is quite straightforward, a lock at the beginning and end of enetc_xdp_xmit() should do the trick. But for XDP_TX, it's a bit more complicated. For one, we do TX batching all by ourselves for frames with the XDP_TX verdict. This is something we would like to keep the way it is, for performance reasons. But batching means that the network stack's lock should be kept from the first enqueued XDP_TX frame and until we ring the doorbell. That is mostly fine, except for cases when in the same NAPI loop we have mixed XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while we are holding the lock from the RX NAPI, then bam, deadlock. The naive answer could be 'just flush the XDP_TX frames first, then release the network stack's TX queue lock, then call xdp_do_flush_map()'. But even xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT frames, so unless we unlock/relock the TX queue around xdp_do_redirect(), there simply isn't any clean way to protect XDP_TX from concurrent network stack .ndo_start_xmit() on another CPU. So we need to take a different approach, and that is to reserve two rings for the sole use of XDP. We leave TX rings 0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we pick them from the end of the priv->tx_ring array. We make an effort to keep the mapping done by enetc_alloc_msix() which decides which CPU handles the TX completions of which TX ring in its NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the XDP TX ring of CPU 1 is handled by TX ring 7. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: remove unneeded xdp_do_flush_map()	Vladimir Oltean	1	-5/+0
	xdp_do_redirect already contains: -> dev_map_enqueue -> __xdp_enqueue -> bq_enqueue -> bq_xmit_all // if we have more than 16 frames So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK is 128. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: stop XDP NAPI processing when build_skb() fails	Vladimir Oltean	1	-2/+2
	When the code path below fails: enetc_clean_rx_ring_xdp // XDP_PASS -> enetc_build_skb -> enetc_map_rx_buff_to_skb -> build_skb enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't strong enough to actually break the NAPI poll loop, just the switch/case statement for XDP actions. So we increment rx_frm_cnt and go to the next frames minding our own business. Instead let's do what the skb NAPI poll function does, and break the loop now, waiting for the memory pressure to go away. Otherwise the next calls to build_skb() are likely to fail too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: recycle buffers for frames with RX errors	Vladimir Oltean	1	-0/+2
	When receiving a frame with errors, currently we do nothing with it (we don't construct an skb or an xdp_buff), we just exit the NAPI poll loop. Let's put the buffer back into the RX ring (similar to XDP_DROP). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: rename the buffer reuse helpers	Vladimir Oltean	1	-30/+24
	enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a helper to populate the recycle end of the shadow RX BD ring (next_to_alloc) with a given buffer. On the other hand, enetc_put_rx_buff plays more tricks than its name would suggest. So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff into enetc_put_rx_buff which suggests a more garden-variety operation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16	net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path	Vladimir Oltean	1	-2/+0
	Later in enetc_clean_tx_ring we have: /* Scrub the swbd here so we don't have to do that * when we reuse it during xmit / memset(tx_swbd, 0, sizeof(tx_swbd)); So these assignments are unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15	enetc: convert to schedule_work()	Yangbo Lu	1	-1/+1
	Convert system_wq queue_work() to schedule_work() which is a wrapper around it, since the former is a rare construct. Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12	enetc: support PTP Sync packet one-step timestamping	Yangbo Lu	1	-20/+171
	This patch is to add support for PTP Sync packet one-step timestamping. Since ENETC single-step register has to be configured dynamically per packet for correctionField offeset and UDP checksum update, current one-step timestamping packet has to be sent only when the last one completes transmitting on hardware. So, on the TX, this patch handles one-step timestamping packet as below: - Trasmit packet immediately if no other one in transfer, or queue to skb queue if there is already one in transfer. The test_and_set_bit_lock() is used here to lock and check state. - Start a work when complete transfer on hardware, to release the bit lock and to send one skb in skb queue if has. And the configuration for one-step timestamping on ENETC before transmitting is, - Set one-step timestamping flag in extension BD. - Write 30 bits current timestamp in tstamp field of extension BD. - Update PTP Sync packet originTimestamp field with current timestamp. - Configure single-step register for correctionField offeset and UDP checksum update. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12	enetc: mark TX timestamp type per skb	Yangbo Lu	1	-4/+6
	Mark TX timestamp type per skb on skb->cb[0], instead of global variable for all skbs. This is a preparation for one step timestamp support. For one-step timestamping enablement, there will be both one-step and two-step PTP messages to transfer. And a skb queue is needed for one-step PTP messages making sure start to send current message only after the last one completed on hardware. (ENETC single-step register has to be dynamically configured per message.) So, marking TX timestamp type per skb is required. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-09	enetc: Use generic rule to map Tx rings to interrupt vectors	Claudiu Manoil	1	-5/+1
	Even if the current mapping is correct for the 1 CPU and 2 CPU cases (currently enetc is included in SoCs with up to 2 CPUs only), better use a generic rule for the mapping to cover all possible cases. The number of CPUs is the same as the number of interrupt vectors: Per device Tx rings - device_tx_ring[idx], where idx = 0..n_rings_total-1 Per interrupt vector Tx rings - int_vector[i].ring[j], where i = 0..n_int_vects-1 j = 0..n_rings_per_v-1 Mapping rule - n_rings_per_v = n_rings_total / n_int_vects for i = 0..n_int_vects - 1: for j = 0..n_rings_per_v - 1: idx = n_int_vects * j + i int_vector[i].ring[j] <- device_tx_ring[idx] Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210409071613.28912-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09	net: enetc: fix TX ring interrupt storm	Vladimir Oltean	1	-1/+2
	The blamed commit introduced a bit in the TX software buffer descriptor structure for determining whether a BD is final or not; we rearm the TX interrupt vector for every frame (hence final BD) transmitted. But there is a problem with the patch: it replaced a condition whose expression is a bool which was evaluated at the beginning of the "while" loop with a bool expression that is evaluated on the spot: tx_swbd->is_eof. The problem with the latter expression is that the tx_swbd has already been incremented at that stage, so the tx_swbd->is_eof check is in fact with the _next_ software BD. Which is _not_ final. The effect is that the CPU is in 100% load with ksoftirqd because it does not acknowledge the TX interrupt, so the handler keeps getting called again and again. The fix is to restore the code structure, and keep the local bool is_eof variable, just to assign it the tx_swbd->is_eof value instead of !!tx_swbd->skb. Fixes: d504498d2eb3 ("net: enetc: add a dedicated is_eof bit in the TX software BD") Reported-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20210409192759.3895104-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09	net: enetc: fix array underflow in error handling code	Dan Carpenter	1	-1/+1
	This loop will try to unmap enetc_unmap_tx_buff[-1] and crash. Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YHBHfCY/yv3EnM9z@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-03-31	net: enetc: add support for XDP_REDIRECT	Vladimir Oltean	1	-11/+201
	The driver implementation of the XDP_REDIRECT action reuses parts from XDP_TX, most notably the enetc_xdp_tx function which transmits an array of TX software BDs. Only this time, the buffers don't have DMA mappings, we need to create them. When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can employ the same buffer reuse strategy as for the normal processing path and for XDP_PASS: we can flip to the other page half and seed that to the RX ring. Note that scatter/gather support is there, but disabled due to lack of multi-buffer support in XDP (which is added by this series): https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: add support for XDP_TX	Vladimir Oltean	1	-24/+193
	For reflecting packets back into the interface they came from, we create an array of TX software BDs derived from the RX software BDs. Therefore, we need to extend the TX software BD structure to contain most of the stuff that's already present in the RX software BD structure, for reasons that will become evident in a moment. For a frame with the XDP_TX verdict, we don't reuse any buffer right away as we do for XDP_DROP (the same page half) or XDP_PASS (the other page half, same as the skb code path). Because the buffer transfers ownership from the RX ring to the TX ring, reusing any page half right away is very dangerous. So what we can do is we can recycle the same page half as soon as TX is complete. The code path is: enetc_poll -> enetc_clean_rx_ring_xdp -> enetc_xdp_tx -> enetc_refill_rx_ring (time passes, another MSI interrupt is raised) enetc_poll -> enetc_clean_tx_ring -> enetc_recycle_xdp_tx_buff But that creates a problem, because there is a potentially large time window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in which we'll have less and less RX buffers. Basically, when the ship starts sinking, the knee-jerk reaction is to let enetc_refill_rx_ring do what it does for the standard skb code path (refill every 16 consumed buffers), but that turns out to be very inefficient. The problem is that we have no rx_swbd->page at our disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would have to call enetc_new_page for every buffer that we refill (if we choose to refill at this early stage). Very inefficient, it only makes the problem worse, because page allocation is an expensive process, and CPU time is exactly what we're lacking. Additionally, there is an even bigger problem: if we let enetc_refill_rx_ring top up the ring's buffers again from the RX path, remember that the buffers sent to transmission haven't disappeared anywhere. They will be eventually sent, and processed in enetc_clean_tx_ring, and an attempt will be made to recycle them. But surprise, the RX ring is already full of new buffers, because we were premature in deciding that we should refill. So not only we took the expensive decision of allocating new pages, but now we must throw away perfectly good and reusable buffers. So what we do is we implement an elastic refill mechanism, which keeps track of the number of in-flight XDP_TX buffer descriptors. We top up the RX ring only up to the total ring capacity minus the number of BDs that are in flight (because we know that those BDs will return to us eventually). The enetc driver manages 1 RX ring per CPU, and the default TX ring management is the same. So we do XDP_TX towards the TX ring of the same index, because it is affined to the same CPU. This will probably not produce great results when we have a tc-taprio/tc-mqprio qdisc on the interface, because in that case, the number of TX rings might be greater, but I didn't add any checks for that yet (mostly because I didn't know what checks to add). It should also be noted that we need to change the DMA mapping direction for RX buffers, since they may now be reflected into the TX ring of the same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and remapping as DMA_TO_DEVICE, because performance is better this way. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: add support for XDP_DROP and XDP_PASS	Vladimir Oltean	1	-20/+264
	For the RX ring, enetc uses an allocation scheme based on pages split into two buffers, which is already very efficient in terms of preventing reallocations / maximizing reuse, so I see no reason why I would change that. +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half B \| half B \| half B \| half B \| half B \| half B \| half B \| \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half A \| half A \| half A \| half A \| half A \| half A \| half A \| RX ring \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ ^ ^ \| \| next_to_clean next_to_alloc next_to_use +--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| half B \| half B \| half B \| half B \| half B \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half B \| half B \| half A \| half A \| half A \| half A \| half A \| RX ring \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| ^ ^ \| half A \| half A \| \| \| \| \| \| next_to_clean next_to_use +--------+--------+ ^ \| next_to_alloc then when enetc_refill_rx_ring is called, whose purpose is to advance next_to_use, it sees that it can take buffers up to next_to_alloc, and it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate one!". The only problem is that for default PAGE_SIZE values of 4096, buffer sizes are 2048 bytes. While this is enough for normal skb allocations at an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256 bytes, and including skb_shared_info and alignment, we end up being able to make use of only 1472 bytes, which is insufficient for the default MTU. To solve that problem, we implement scatter/gather processing in the driver, because we would really like to keep the existing allocation scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and another one of 28 bytes. Because the headroom required by XDP is different (and much larger) than the one required by the network stack, whenever a BPF program is added or deleted on the port, we drain the existing RX buffers and seed new ones with the required headroom. We also keep the required headroom in rx_ring->buffer_offset. The simplest way to implement XDP_PASS, where an skb must be created, is to create an xdp_buff based on the next_to_clean RX BDs, but not clear those BDs from the RX ring yet, just keep the original index at which the BDs for this frame started. Then, if the verdict is XDP_PASS, instead of converting the xdb_buff to an skb, we replay a call to enetc_build_skb (just as in the normal enetc_clean_rx_ring case), starting from the original BD index. We would also like to be minimally invasive to the regular RX data path, and not check whether there is a BPF program attached to the ring on every packet. So we create a separate RX ring processing function for XDP. Because we only install/remove the BPF program while the interface is down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there shouldn't be any circumstance in which we are processing packets and there is a potentially freed BPF program attached to the RX ring. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: move up enetc_reuse_page and enetc_page_reusable	Vladimir Oltean	1	-19/+19
	For XDP_TX, we need to call enetc_reuse_page from enetc_clean_tx_ring, so we need to avoid a forward declaration. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: clean the TX software BD on the TX confirmation path	Vladimir Oltean	1	-0/+4
	With the future introduction of some new fields into enetc_tx_swbd such as is_xdp_tx, is_xdp_redirect etc, we need not only to set these bits to true from the XDP_TX/XDP_REDIRECT code path, but also to false from the old code paths. This is because TX software buffer descriptors are kept in a ring that is shadow of the hardware TX ring, so these structures keep getting reused, and there is always the possibility that when a software BD is reused (after we ran a full circle through the TX ring), the old user of the tx_swbd had set is_xdp_tx = true, and now we are sending a regular skb, which would need to set is_xdp_tx = false. To be minimally invasive to the old code paths, let's just scrub the software TX BD in the TX confirmation path (enetc_clean_tx_ring), once we know that nobody uses this software TX BD (tx_ring->next_to_clean hasn't yet been updated, and the TX paths check enetc_bd_unused which tells them if there's any more space in the TX ring for a new enqueue). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: add a dedicated is_eof bit in the TX software BD	Vladimir Oltean	1	-4/+3
	In the transmit path, if we have a scatter/gather frame, it is put into multiple software buffer descriptors, the last of which has the skb pointer populated (which is necessary for rearming the TX MSI vector and for collecting the two-step TX timestamp from the TX confirmation path). At the moment, this is sufficient, but with XDP_TX, we'll need to service TX software buffer descriptors that don't have an skb pointer, however they might be final nonetheless. So add a dedicated bit for final software BDs that we populate and check explicitly. Also, we keep looking just for an skb when doing TX timestamping, because we don't want/need that for XDP. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: move skb creation into enetc_build_skb	Vladimir Oltean	1	-37/+44
	We need to build an skb from two code paths now: from the plain RX data path and from the XDP data path when the verdict is XDP_PASS. Create a new enetc_build_skb function which contains the essential steps for building an skb based on the first and last positions of buffer descriptors within the RX ring. We also squash the enetc_process_skb function into enetc_build_skb, because what that function did wasn't very meaningful on its own. The "rx_frm_cnt++" instruction has been moved around napi_gro_receive for cosmetic reasons, to be in the same spot as rx_byte_cnt++, which itself must be before napi_gro_receive, because that's when we lose ownership of the skb. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31	net: enetc: consume the error RX buffer descriptors in a dedicated function	Vladimir Oltean	1	-16/+27
	We can and should check the RX BD errors before starting to build the skb. The only apparent reason why things are done in this backwards order is to spare one call to enetc_rxbd_next. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: make enetc_refill_rx_ring update the consumer index	Vladimir Oltean	1	-9/+8
	Since commit fd5736bf9f23 ("enetc: Workaround for MDIO register access issue"), enetc_refill_rx_ring no longer updates the RX BD ring's consumer index, that is left to be done by the caller. This has led to bugs such as the ones found in 96a5223b918c ("net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr") and 3a5d12c9be6f ("net: enetc: keep RX ring consumer index in sync with hardware"), so it is desirable that we move back the update of the consumer index into enetc_refill_rx_ring. The trouble with that is the different MDIO locking context for the two callers of enetc_refill_rx_ring: - enetc_clean_rx_ring runs under enetc_lock_mdio() - enetc_setup_rxbdr runs outside enetc_lock_mdio() Simplify the callers of enetc_refill_rx_ring by making enetc_setup_rxbdr explicitly take enetc_lock_mdio() around the call. It will be the only place in need of ensuring the hot accessors can be used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: remove forward declaration for enetc_map_tx_buffs	Vladimir Oltean	1	-38/+35
	There is no other reason why this forward declaration exists rather than poor ordering of the functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: remove forward-declarations of enetc_clean_{rx,tx}_ring	Vladimir Oltean	1	-48/+44
	This patch moves the NAPI enetc_poll after enetc_clean_rx_ring such that we can delete the forward declarations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: simplify callers of enetc_rxbd_next	Vladimir Oltean	1	-16/+5
	When we iterate through the BDs in the RX ring, the software producer index (which is already passed by value to enetc_rxbd_next) lags behind, and we end up with this funny looking "++i == rx_ring->bd_count" check so that we drag it after us. Let's pass the software producer index "i" by reference, so that enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count), especially since enetc_rxbd_next has to increment the index anyway. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: don't initialize unused ports from a separate code path	Vladimir Oltean	1	-19/+2
	Since commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused ports too") there is a requirement to initialize the memories of unused PFs too, which has left the probe path in a bit of a rough shape, because we basically have a minimal initialization path for unused PFs which is separate from the main initialization path. Now that initializing a control BD ring is as simple as calling enetc_setup_cbdr, let's move that outside of enetc_alloc_si_resources (unused PFs don't need classification rules, so no point in allocating them just to free them later). But enetc_alloc_si_resources is called both for PFs and for VFs, so now that enetc_setup_cbdr is no longer called from this common function, it means that the VF probe path needs to explicitly call enetc_setup_cbdr too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: pass bd_count as an argument to enetc_setup_cbdr	Vladimir Oltean	1	-4/+2
	It makes no sense from an API perspective to first initialize some portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that function to initialize the rest. enetc_setup_cbdr should be able to perform all initialization given a zero-initialized struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: squash clear_cbdr and free_cbdr into teardown_cbdr	Vladimir Oltean	1	-4/+2
	All call sites call enetc_clear_cbdr and enetc_free_cbdr one after another, so let's combine the two functions into a single method named enetc_teardown_cbdr which does both, and in the same order. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10	net: enetc: save the mode register address inside struct enetc_cbdr	Vladimir Oltean	1	-2/+2
	enetc_clear_cbdr depends on struct enetc_hw because it must disable the ring through a register write. We'd like to remove that dependency, so let's do what's already done with the producer and consumer indices, which is to save the iomem address in a variable kept in struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>