linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-04-21	i40e, xsk: Get rid of redundant 'fallthrough'	Maciej Fijalkowski	1	-1/+0
	Intel drivers translate actions returned from XDP programs to their own return codes that have the following mapping: XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED} XDP_TX -> I40E_XDP_{TX,CONSUMED} XDP_DROP -> I40E_XDP_CONSUMED XDP_ABORTED -> I40E_XDP_CONSUMED XDP_PASS -> I40E_XDP_PASS Commit b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") introduced new translation XDP_REDIRECT -> I40E_XDP_EXIT which is set when XSK RQ gets full and to indicate that driver should stop further Rx processing. This happens for unsuccessful xdp_do_redirect() so it is valuable to call trace_xdp_exception() for this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED overwrite, XDP_DROP case was moved above which in turn made the 'fallthrough' that is in XDP_ABORTED useless as it became the last label in the switch statement. Simply drop this leftover. Fixes: b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220421132126.471515-3-maciej.fijalkowski@intel.com
2022-04-21	ixgbe, xsk: Get rid of redundant 'fallthrough'	Maciej Fijalkowski	1	-1/+0
	Intel drivers translate actions returned from XDP programs to their own return codes that have the following mapping: XDP_REDIRECT -> IXGBE_XDP_{REDIR,CONSUMED} XDP_TX -> IXGBE_XDP_{TX,CONSUMED} XDP_DROP -> IXGBE_XDP_CONSUMED XDP_ABORTED -> IXGBE_XDP_CONSUMED XDP_PASS -> IXGBE_XDP_PASS Commit c7dd09fd4628 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") introduced new translation XDP_REDIRECT -> IXGBE_XDP_EXIT which is set when XSK RQ gets full and to indicate that driver should stop further Rx processing. This happens for unsuccessful xdp_do_redirect() so it is valuable to call trace_xdp_exception() for this case. In order to avoid IXGBE_XDP_EXIT -> IXGBE_XDP_CONSUMED overwrite, XDP_DROP case was moved above which in turn made the 'fallthrough' that is in XDP_ABORTED useless as it became the last label in the switch statement. Simply drop this leftover. Fixes: c7dd09fd4628 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220421132126.471515-2-maciej.fijalkowski@intel.com
2022-04-19	bpf: Move rcu lock management out of BPF_PROG_RUN routines	Stanislav Fomichev	1	-2/+6
	Commit 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") switched a bunch of BPF_PROG_RUN macros to inline routines. This changed the semantic a bit. Due to arguments expansion of macros, it used to be: rcu_read_lock(); array = rcu_dereference(cgrp->bpf.effective[atype]); ... Now, with with inline routines, we have: array_rcu = rcu_dereference(cgrp->bpf.effective[atype]); /* array_rcu can be kfree'd here */ rcu_read_lock(); array = rcu_dereference(array_rcu); I'm assuming in practice rcu subsystem isn't fast enough to trigger this but let's use rcu API properly. Also, rename to lower caps to not confuse with macros. Additionally, drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY. See [1] for more context. [1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u v2 - keep rcu locks inside by passing cgroup_bpf Fixes: 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com
2022-04-15	ice, xsk: Avoid refilling single Rx descriptors	Maciej Fijalkowski	1	-1/+4
	Call alloc Rx routine for ZC driver only when the amount of unallocated descriptors exceeds given threshold. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-14-maciej.fijalkowski@intel.com
2022-04-15	stmmac, xsk: Diversify return values from xsk_wakeup call paths	Maciej Fijalkowski	1	-2/+2
	Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in stmmac's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-13-maciej.fijalkowski@intel.com
2022-04-15	mlx5, xsk: Diversify return values from xsk_wakeup call paths	Maciej Fijalkowski	1	-1/+1
	Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIO in mlx5's ndo_xsk_wakeup() implementation to EINVAL, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-12-maciej.fijalkowski@intel.com
2022-04-15	ixgbe, xsk: Diversify return values from xsk_wakeup call paths	Maciej Fijalkowski	1	-3/+3
	Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in ixgbe's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-11-maciej.fijalkowski@intel.com
2022-04-15	i40e, xsk: Diversify return values from xsk_wakeup call paths	Maciej Fijalkowski	1	-3/+3
	Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in i40e's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-10-maciej.fijalkowski@intel.com
2022-04-15	ice, xsk: Diversify return values from xsk_wakeup call paths	Maciej Fijalkowski	1	-3/+3
	Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in ice's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-9-maciej.fijalkowski@intel.com
2022-04-15	ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full	Maciej Fijalkowski	2	-9/+19
	When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK Rx queue being full. In such case, terminate the Rx processing that is being done on the current HW Rx ring and let the user space consume descriptors from XSK Rx queue so that there is room that driver can use later on. Introduce new internal return code IXGBE_XDP_EXIT that will indicate case described above. Note that it does not affect Tx processing that is bound to the same NAPI context, nor the other Rx rings. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-8-maciej.fijalkowski@intel.com
2022-04-15	i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full	Maciej Fijalkowski	2	-10/+23
	When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK Rx queue being full. In such case, terminate the Rx processing that is being done on the current HW Rx ring and let the user space consume descriptors from XSK Rx queue so that there is room that driver can use later on. Introduce new internal return code I40E_XDP_EXIT that will indicate case described above. Note that it does not affect Tx processing that is bound to the same NAPI context, nor the other Rx rings. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-7-maciej.fijalkowski@intel.com
2022-04-15	ice, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full	Maciej Fijalkowski	2	-10/+20
	When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK Rx queue being full. In such case, terminate the Rx processing that is being done on the current HW Rx ring and let the user space consume descriptors from XSK Rx queue so that there is room that driver can use later on. Introduce new internal return code ICE_XDP_EXIT that will indicate case described above. Note that it does not affect Tx processing that is bound to the same NAPI context, nor the other Rx rings. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-6-maciej.fijalkowski@intel.com
2022-04-15	ixgbe, xsk: Decorate IXGBE_XDP_REDIR with likely()	Maciej Fijalkowski	1	-12/+13
	ixgbe_run_xdp_zc() suggests to compiler that XDP_REDIRECT is the most probable action returned from BPF program that AF_XDP has in its pipeline. Let's also bring this suggestion up to the callsite of ixgbe_run_xdp_zc() so that compiler will be able to generate more optimized code which in turn will make branch predictor happy. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-5-maciej.fijalkowski@intel.com
2022-04-15	ice, xsk: Decorate ICE_XDP_REDIR with likely()	Maciej Fijalkowski	1	-10/+11
	ice_run_xdp_zc() suggests to compiler that XDP_REDIRECT is the most probable action returned from BPF program that AF_XDP has in its pipeline. Let's also bring this suggestion up to the callsite of ice_run_xdp_zc() so that compiler will be able to generate more optimized code which in turn will make branch predictor happy. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-4-maciej.fijalkowski@intel.com
2022-04-08	sfc: use hardware tx timestamps for more than PTP	Bert Kenward	1	-1/+2
	The 8000 series and newer NICs all get hardware timestamps from the MAC and can provide timestamps on a normal TX queue, rather than via a slow path through the MC. As such we can use this path for any packet where a hardware timestamp is requested. This also enables support for PTP over transports other than IPv4+UDP. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: Edward Cree <ecree@xilinx.com> Link: https://lore.kernel.org/r/510652dc-54b4-0e11-657e-e37ee3ca26a9@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08	net: phy: micrel: ksz9031/ksz9131: add cabletest support	Marek Vasut	1	-0/+221
	Add cable test support for Micrel KSZ9x31 PHYs. Tested on i.MX8M Mini with KSZ9131RNX in 100/Full mode with pairs shuffled before magnetics: (note: Cable test started/completed messages are omitted) mx8mm-ksz9131-a-d-connected$ ethtool --cable-test eth0 Pair A code OK Pair B code Short within Pair Pair B, fault length: 0.80m Pair C code Short within Pair Pair C, fault length: 0.80m Pair D code OK mx8mm-ksz9131-a-b-connected$ ethtool --cable-test eth0 Pair A code OK Pair B code OK Pair C code Short within Pair Pair C, fault length: 0.00m Pair D code Short within Pair Pair D, fault length: 0.00m Tested on R8A77951 Salvator-XS with KSZ9031RNX and all four pairs connected: (note: Cable test started/completed messages are omitted) r8a7795-ksz9031-all-connected$ ethtool --cable-test eth0 Pair A code OK Pair B code OK Pair C code OK Pair D code OK The CTRL1000 CTL1000_ENABLE_MASTER and CTL1000_AS_MASTER bits are not restored by calling phy_init_hw(), they must be manually cached in ksz9x31_cable_test_start() and restored at the end of ksz9x31_cable_test_get_status(). Signed-off-by: Marek Vasut <marex@denx.de> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Oleksij Rempel <linux@rempel-privat.de> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220407105534.85833-1-marex@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex	David S. Miller	2	-290/+211
	t-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-04-07 Alexander Lobakin says: This hunts down several places around packet templates/dummies for switch rules which are either repetitive, fragile or just not really readable code. It's a common need to add new packet templates and to review such changes as well, try to simplify both with the help of a pair macros and aliases. ice_find_dummy_packet() became very complex at this point with tons of nested if-elses. It clearly showed this approach does not scale, so convert its logics to the simple mask-match + static const array. bloat-o-meter is happy about that (built w/ LLVM 13): add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056) Function old new delta ice_fill_adv_dummy_packet 289 291 +2 ice_adv_add_update_vsi_list 201 - -201 ice_add_adv_rule 2950 2093 -857 Total: Before=414512, After=413456, chg -0.25% add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672) RO Data old new delta ice_dummy_pkt_profiles - 672 +672 Total: Before=37895, After=38567, chg +1.77% Diffstat also looks nice, and adding new packet templates now takes less lines. We'll probably come out with dynamic template crafting in a while, but for now let's improve what we have currently. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	net: mdio: aspeed: Add c45 support	Potin Lai	1	-5/+32
	Add Clause 45 support for Aspeed mdio driver. Signed-off-by: Potin Lai <potin.lai@quantatw.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	net: mdio: aspeed: Introduce read write function for c22 and c45	Potin Lai	1	-12/+34
	Add following additional functions to move out the implementation from aspeed_mdio_read() and aspeed_mdio_write(). c22: - aspeed_mdio_read_c22() - aspeed_mdio_write_c22() c45: - aspeed_mdio_read_c45() - aspeed_mdio_write_c45() Signed-off-by: Potin Lai <potin.lai@quantatw.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	net: mdio: aspeed: move reg accessing part into separate functions	Potin Lai	1	-32/+38
	Add aspeed_mdio_op() and aseed_mdio_get_data() for register accessing. aspeed_mdio_op() handles operations, write command to control register, then check and wait operations is finished (bit 31 is cleared). aseed_mdio_get_data() fetchs the result value of operation from data register. Signed-off-by: Potin Lai <potin.lai@quantatw.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	net: atm: remove the ambassador driver	Jakub Kicinski	4	-3074/+0
	The driver for ATM Ambassador devices spews build warnings on microblaze. The virt_to_bus() calls discard the volatile keyword. The right thing to do would be to migrate this driver to a modern DMA API but it seems unlikely anyone is actually using it. There had been no fixes or functional changes here since the git era begun. In fact it sounds like the FW loading was broken from 2008 'til 2012 - see commit fcdc90b025e6 ("atm: forever loop loading ambassador firmware"). Let's remove this driver, there isn't much changing in the APIs, if users come forward we can apologize and revert. Link: https://lore.kernel.org/all/20220321144013.440d7fc0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: XDP multibuffer enablement	Andy Gospodarek	2	-4/+4
	Allow aggregation buffers to be in place in the receive path and allow XDP programs to be attached when using a larger than 4k MTU. v3: Add a check to sure XDP program supports multipage packets. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: support transmit and free of aggregation buffers	Andy Gospodarek	5	-23/+126
	This patch adds the following features: - Support for XDP_TX and XDP_DROP action when using xdp_buff with frags - Support for freeing all frags attached to an xdp_buff - Cleanup of TX ring buffers after transmits complete - Slight change in definition of bnxt_sw_tx_bd since nr_frags and RX producer may both need to be used - Clear out skb_shared_info at the end of the buffer v2: Fix uninitialized variable warning in bnxt_xdp_buff_frags_free(). Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff	Andy Gospodarek	3	-7/+85
	Since we have an xdp_buff with frags there needs to be a way to convert that into a valid sk_buff in the event that XDP_PASS is the resulting operation. This adds a new rx_skb_func when the netdev has an MTU that prevents the packets from sitting in a single page. This also make sure that GRO/LRO stay disabled even when using the aggregation ring for large buffers. v3: Use BNXT_PAGE_MODE_BUF_SIZE for build_skb Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: add page_pool support for aggregation ring when using xdp	Andy Gospodarek	1	-30/+47
	If we are using aggregation rings with XDP enabled, allocate page buffers for the aggregation rings from the page_pool. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: change receive ring space parameters	Andy Gospodarek	2	-15/+28
	Modify ring header data split and jumbo parameters to account for the fact that the design for XDP multibuffer puts close to the first 4k of data in a page and the remaining portions of the packet go in the aggregation ring. v3: Simplified code around initial buffer size calculation Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: set xdp_buff pfmemalloc flag if needed	Andy Gospodarek	1	-5/+9
	Set the pfmemaloc flag in the xdp buff so that this can be copied to the skb if needed for an XDP_PASS action. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: adding bnxt_rx_agg_pages_xdp for aggregated xdp	Andy Gospodarek	1	-0/+31
	This patch adds a new function that will read pages from the aggregation ring and create an xdp_buff with frags based on the entries in the aggregation ring. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: rename bnxt_rx_pages to bnxt_rx_agg_pages_skb	Andy Gospodarek	1	-11/+11
	Clarify that this is reading buffers from the aggregation ring. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: refactor bnxt_rx_pages operate on skb_shared_info	Andy Gospodarek	1	-17/+33
	Rather than operating on an sk_buff, add frags from the aggregation ring into the frags of an skb_shared_info. This will allow the caller to use either an sk_buff or xdp_buff. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: add flag to denote that an xdp program is currently attached	Andy Gospodarek	1	-0/+7
	This will be used to determine if bnxt_rx_xdp should be called rather than calling it every time. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08	bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff	Andy Gospodarek	4	-19/+53
	Move initialization of xdp_buff outside of bnxt_rx_xdp to prepare for allowing bnxt_rx_xdp to operate on multibuffer xdp_buffs. v2: Fix uninitalized variables warning in bnxt_xdp.c. v3: Add new define BNXT_PAGE_MODE_BUF_SIZE Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	46	-342/+545
	No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	Merge tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds	31	-257/+317
	Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - new code bugs: - mctp: correct mctp_i2c_header_create result - eth: fungible: fix reference to __udivdi3 on 32b builds - eth: micrel: remove latencies support lan8814 Previous releases - regressions: - bpf: resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT - vrf: fix packet sniffing for traffic originating from ip tunnels - rxrpc: fix a race in rxrpc_exit_net() - dsa: revert "net: dsa: stop updating master MTU from master.c" - eth: ice: fix MAC address setting Previous releases - always broken: - tls: fix slab-out-of-bounds bug in decrypt_internal - bpf: support dual-stack sockets in bpf_tcp_check_syncookie - xdp: fix coalescing for page_pool fragment recycling - ovs: fix leak of nested actions - eth: sfc: - add missing xdp queue reinitialization - fix using uninitialized xdp tx_queue - eth: ice: - clear default forwarding VSI during VSI release - fix broken IFF_ALLMULTI handling - synchronize_rcu() when terminating rings - eth: qede: confirm skb is allocated before using - eth: aqc111: fix out-of-bounds accesses in RX fixup - eth: slip: fix NPD bug in sl_tx_timeout()" * tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) drivers: net: slip: fix NPD bug in sl_tx_timeout() bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets bpf: Support dual-stack sockets in bpf_tcp_check_syncookie myri10ge: fix an incorrect free for skb in myri10ge_sw_tso net: usb: aqc111: Fix out-of-bounds accesses in RX fixup qede: confirm skb is allocated before using net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n net: phy: mscc-miim: reject clause 45 register accesses net: axiemac: use a phandle to reference pcs_phy dt-bindings: net: add pcs-handle attribute net: axienet: factor out phy_node in struct axienet_local net: axienet: setup mdio unconditionally net: sfc: fix using uninitialized xdp tx_queue rxrpc: fix a race in rxrpc_exit_net() net: openvswitch: fix leak of nested actions net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() net: openvswitch: don't send internal clone attribute to the userspace. net: micrel: Fix KS8851 Kconfig ice: clear cmd_type_offset_bsz for TX rings ice: xsk: fix VSI state check in ice_xsk_wakeup() ...
2022-04-07	hv_netvsc: Print value of invalid ID in netvsc_send_{completion,tx_complete}()	Andrea Parri (Microsoft)	1	-4/+4
	That being useful for debugging purposes. Notice that the packet descriptor is in "private" guest memory, so that Hyper-V can not tamper with it. While at it, remove two unnecessary u64-casts. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	qed: remove an unneed NULL check on list iterator	Xiaomeng Tong	1	-2/+2
	The define for_each_pci_dev(d) is: while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL) Thus, the list iterator 'd' is always non-NULL so it doesn't need to be checked. So just remove the unnecessary NULL check. Also remove the unnecessary initializer because the list iterator is always initialized. Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Link: https://lore.kernel.org/r/20220406015921.29267-1-xiam0nd.tong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	sfc: Stop using iommu_present()	Robin Murphy	1	-1/+3
	Even if an IOMMU might be present for some PCI segment in the system, that doesn't necessarily mean it provides translation for the device we care about. It appears that what we care about here is specifically whether DMA mapping ops involve any IOMMU overhead or not, so check for translation actually being active for our device. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/7350f957944ecfce6cce90f422e3992a1f428775.1649166055.git.robin.murphy@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	net: hyperv: remove use of bpf_op_t	Jakub Kicinski	1	-4/+2
	Following patch will hide that typedef. There seems to be no strong reason for hyperv to use it, so let's not. Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	Merge tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux	Linus Torvalds	6	-20/+131
	Pull hyperv fixes from Wei Liu: - Correctly propagate coherence information for VMbus devices (Michael Kelley) - Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng) - Use barrier to prevent reording when reading ring buffer (Michael Kelley) - Use virt_store_mb in favour of smp_store_mb (Andrea Parri) - Fix VMbus device object initialization (Andrea Parri) - Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri) - Fix a crash when unloading VMbus module (Guilherme G. Piccoli) * tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() Drivers: hv: balloon: Disable balloon and hot-add accordingly Drivers: hv: balloon: Support status report for larger page sizes Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer PCI: hv: Propagate coherence from VMbus device to PCI device Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device Drivers: hv: vmbus: Fix potential crash on module unload Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register() Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
2022-04-07	Merge tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random	Linus Torvalds	1	-35/+39
	Pull random number generator fixes from Jason Donenfeld: - Another fixup to the fast_init/crng_init split, this time in how much entropy is being credited, from Jan Varho. - As discussed, we now opportunistically call try_to_generate_entropy() in /dev/urandom reads, as a replacement for the reverted commit. I opted to not do the more invasive wait_for_random_bytes() change at least for now, preferring to do something smaller and more obvious for the time being, but maybe that can be revisited as things evolve later. - Userspace can use FUSE or userfaultfd or simply move a process to idle priority in order to make a read from the random device never complete, which breaks forward secrecy, fixed by overwriting sensitive bytes early on in the function. - Jann Horn noticed that /dev/urandom reads were only checking for pending signals if need_resched() was true, a bug going back to the genesis commit, now fixed by always checking for signal_pending() and calling cond_resched(). This explains various noticeable signal delivery delays I've seen in programs over the years that do long reads from /dev/urandom. - In order to be more like other devices (e.g. /dev/zero) and to mitigate the impact of fixing the above bug, which has been around forever (users have never really needed to check the return value of read() for medium-sized reads and so perhaps many didn't), we now move signal checking to the bottom part of the loop, and do so every PAGE_SIZE-bytes. * tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: check for signals every PAGE_SIZE chunk of /dev/[u]random random: check for signal_pending() outside of need_resched() check random: do not allow user to keep crng key around on stack random: opportunistically initialize on /dev/urandom reads random: do not split fast init input in add_hwgenerator_randomness()
2022-04-07	Merge tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata	Linus Torvalds	6	-6/+15
	Pull ata fixes from Damien Le Moal: - Fix a compilation warning due to an uninitialized variable in ata_sff_lost_interrupt(), from me. - Fix invalid internal command tag handling in the sata_dwc_460ex driver, from Christian. - Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command causes the drives to hang, from Christian. - Change the config option CONFIG_SATA_LPM_POLICY back to its original name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with users losing their configuration (as discussed during the merge window), from Mario. * tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs ata: sata_dwc_460ex: Fix crash due to OOB write ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()
2022-04-07	ice: switch: convert packet template match code to rodata	Alexander Lobakin	1	-107/+108
	Trade text size for rodata size and replace tons of nested if-elses to the const mask match based structs. The almost entire ice_find_dummy_packet() now becomes just one plain while-increment loop. The order in ice_dummy_pkt_profiles[] should be same with the if-elses order previously, as masks become less and less strict through the array to follow the original code flow. Apart from removing 80 locs of 4-level if-elses, it brings a solid text size optimization: add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056) Function old new delta ice_fill_adv_dummy_packet 289 291 +2 ice_adv_add_update_vsi_list 201 - -201 ice_add_adv_rule 2950 2093 -857 Total: Before=414512, After=413456, chg -0.25% add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672) RO Data old new delta ice_dummy_pkt_profiles - 672 +672 Total: Before=37895, After=38567, chg +1.77% Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07	ice: switch: use convenience macros to declare dummy pkt templates	Alexander Lobakin	1	-71/+62
	Declarations of dummy/template packet headers and offsets can be minified to improve readability and simplify adding new templates. Move all the repetitive constructions into two macros and let them do the name and type expansions. Linewrap removal is yet another positive side effect. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07	ice: switch: use a struct to pass packet template params	Alexander Lobakin	1	-174/+94
	ice_find_dummy_packet() contains a lot of boilerplate code and a nice room for copy-paste mistakes. Instead of passing 3 separate pointers back and forth to get packet template (dummy) params, directly return a structure containing them. Then, use a macro to compose compound literals and avoid code duplication on return path. Now, dummy packet type/name is needed only once to return a full correct triple pkt-pkt_len-offsets, and those are all one-liners. dummy_ipv4_gtpu_ipv4_packet_offsets is just moved around and renamed (as well as dummy_ipv6_gtp_packet_offsets) with no function changes. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07	ice: switch: unobscurify bitops loop in ice_fill_adv_dummy_packet()	Alexander Lobakin	1	-7/+9
	A loop performing header modification according to the provided mask in ice_fill_adv_dummy_packet() is very cryptic (and error-prone). Replace two identical cast-deferences with a variable. Replace three struct-member-array-accesses with a variable. Invert the condition, reduce the indentation by one -> eliminate line wraps. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07	ice: switch: add and use u16[] aliases to ice_adv_lkup_elem::{h, m}_u	Alexander Lobakin	2	-10/+17
	ice_adv_lkup_elem fields h_u and m_u are being accessed as raw u16 arrays in several places. To reduce cast and braces burden, add permanent array-of-u16 aliases with the same size as the `union ice_prot_hdr` itself via anonymous unions to the actual struct declaration, and just access them directly. This: - removes the need to cast the union to u16[] and then dereference it each time -> reduces the horizon for potential bugs; - improves -Warray-bounds coverage -- the array size is now known at compilation time; - addresses cppcheck complaints. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-06	drivers: net: slip: fix NPD bug in sl_tx_timeout()	Duoming Zhou	1	-1/+1
	When a slip driver is detaching, the slip_close() will act to cleanup necessary resources and sl->tty is set to NULL in slip_close(). Meanwhile, the packet we transmit is blocked, sl_tx_timeout() will be called. Although slip_close() and sl_tx_timeout() use sl->lock to synchronize, we don`t judge whether sl->tty equals to NULL in sl_tx_timeout() and the null pointer dereference bug will happen. (Thread 1) \| (Thread 2) \| slip_close() \| spin_lock_bh(&sl->lock) \| ... ... \| sl->tty = NULL //(1) sl_tx_timeout() \| spin_unlock_bh(&sl->lock) spin_lock(&sl->lock); \| ... \| ... tty_chars_in_buffer(sl->tty)\| if (tty->ops->..) //(2) \| ... \| synchronize_rcu() We set NULL to sl->tty in position (1) and dereference sl->tty in position (2). This patch adds check in sl_tx_timeout(). If sl->tty equals to NULL, sl_tx_timeout() will goto out. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06	prestera: acl: add action hw_stats support	Volodymyr Mytnyk	2	-7/+18
	Currently, when user adds a tc action and the action gets offloaded, the user expects the HW stats to be counted also. This limits the amount of supported offloaded filters, as HW counter resources may be quite limited. Without counter assigned, the HW is capable to carry much more filters. To resolve the issue above, the following types of HW stats are offloaded and supported by the driver: any - current default, user does not care about the type. delayed - polled from HW periodically. disabled - no HW stats needed. immediate - not supported. Example: tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x11 \ action drop tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x12 \ action drop hw_stats disabled tc filter add dev sw1p1 ingress proto ip flower skip_sw ip_proto 0x14 \ action drop hw_stats delayed Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Link: https://lore.kernel.org/r/1649164814-18731-1-git-send-email-volodymyr.mytnyk@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07	random: check for signals every PAGE_SIZE chunk of /dev/[u]random	Jason A. Donenfeld	1	-10/+7
	In 1448769c9cdb ("random: check for signal_pending() outside of need_resched() check"), Jann pointed out that we previously were only checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process had TIF_NEED_RESCHED set, which meant in practice, super long reads to /dev/[u]random would delay signal handling by a long time. I tried this using the below program, and indeed I wasn't able to interrupt a /dev/urandom read until after several megabytes had been read. The bug he fixed has always been there, and so code that reads from /dev/urandom without checking the return value of read() has mostly worked for a long time, for most sizes, not just for <= 256. Maybe it makes sense to keep that code working. The reason it was so small prior, ignoring the fact that it didn't work anyway, was likely because /dev/random used to block, and that could happen for pretty large lengths of time while entropy was gathered. But now, it's just a chacha20 call, which is extremely fast and is just operating on pure data, without having to wait for some external event. In that sense, /dev/[u]random is a lot more like /dev/zero. Taking a page out of /dev/zero's read_zero() function, it always returns at least one chunk, and then checks for signals after each chunk. Chunk sizes there are of length PAGE_SIZE. Let's just copy the same thing for /dev/[u]random, and check for signals and cond_resched() for every PAGE_SIZE amount of data. This makes the behavior more consistent with expectations, and should mitigate the impact of Jann's fix for the age-old signal check bug. ---- test program ---- #include <unistd.h> #include <signal.h> #include <stdio.h> #include <sys/random.h> static unsigned char x[~0U]; static void handle(int) { } int main(int argc, char *argv[]) { pid_t pid = getpid(), child; signal(SIGUSR1, handle); if (!(child = fork())) { for (;;) kill(pid, SIGUSR1); } pause(); printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0)); kill(child, SIGTERM); return 0; } Cc: Jann Horn <jannh@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-06	bnx2x: Fix undefined behavior due to shift overflowing the constant	Borislav Petkov	1	-1/+1
	Fix: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function ‘bnx2x_check_blocks_with_parity3’: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:4917:4: error: case label does not reduce to an integer constant case AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY: ^~~~ See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory details as to why it triggers with older gccs only. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ariel Elior <aelior@marvell.com> Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: Manish Chopra <manishc@marvell.com> Cc: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220405151517.29753-4-bp@alien8.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>