aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/samples (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-14ice: Add XDP frame size to driverJesper Dangaard Brouer1-9/+25
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945347002.97035.328088795813704587.stgit@firesoul
2020-05-14i40e: Add XDP frame size to driverJesper Dangaard Brouer1-5/+25
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul
2020-05-14ixgbevf: Add XDP frame size to VF driverJesper Dangaard Brouer1-7/+27
This patch mirrors the changes to ixgbe in previous patch. This VF driver doesn't support XDP_REDIRECT, but correct tailroom is still necessary for BPF-helper xdp_adjust_tail. In legacy-mode + larger PAGE_SIZE, due to lacking tailroom, we accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945345984.97035.13518286183248025173.stgit@firesoul
2020-05-14ixgbe: Add XDP frame size to driverJesper Dangaard Brouer1-8/+26
This driver uses different memory models depending on PAGE_SIZE at compile time. For PAGE_SIZE 4K it uses page splitting, meaning for normal MTU frame size is 2048 bytes (and headroom 192 bytes). For larger MTUs the driver still use page splitting, by allocating order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than 4K, driver instead advance its rx_buffer->page_offset with the frame size "truesize". For XDP frame size calculations, this mean that in PAGE_SIZE larger than 4K mode the frame_sz change on a per packet basis. For the page split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be updated once outside the main NAPI loop. The default setting in the driver uses build_skb(), which provides the necessary headroom and tailroom for XDP-redirect in RX-frame (in both modes). There is one complication, which is legacy-rx mode (configurable via ethtool priv-flags). There are zero headroom in this mode, which is a requirement for XDP-redirect to work. The conversion to xdp_frame (convert_to_xdp_frame) will detect this insufficient space, and xdp_do_redirect() call will fail. This is deemed acceptable, as it allows other XDP actions to still work in legacy-mode. In legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also accept that xdp_adjust_tail shrink doesn't work. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Link: https://lore.kernel.org/bpf/158945345455.97035.14334355929030628741.stgit@firesoul
2020-05-14ixgbe: Fix XDP redirect on archs with PAGE_SIZE above 4KJesper Dangaard Brouer1-1/+2
The ixgbe driver have another memory model when compiled on archs with PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in two halves, but instead increment rx_buffer->page_offset by truesize of packet (which include headroom and tailroom for skb_shared_info). This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip which is currently only called on XDP_TX and XDP_REDIRECT, it forgets to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for veth and cpumap. Fix by adding size of skb_shared_info tailroom. Maintainers notice: This fix have been queued to Jeff. Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: https://lore.kernel.org/bpf/158945344946.97035.17031588499266605743.stgit@firesoul
2020-05-14virtio_net: Add XDP frame size in two code pathsJesper Dangaard Brouer1-3/+12
The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
2020-05-14vhost_net: Also populate XDP frame sizeJesper Dangaard Brouer1-0/+1
In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343928.97035.4620233649151726289.stgit@firesoul
2020-05-14tun: Add XDP frame sizeJesper Dangaard Brouer1-0/+2
The tun driver have two code paths for running XDP (bpf_prog_run_xdp). In both cases 'buflen' contains enough tailroom for skb_shared_info. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343419.97035.9594485183958037621.stgit@firesoul
2020-05-14nfp: Add XDP frame size to netronome driverJesper Dangaard Brouer1-0/+6
The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM. Thus, adjust for this when setting xdp.frame_sz, as it counts from data_hard_start. When doing XDP_TX this driver is smart and instead of a full DMA-map does a DMA-sync on with packet length. As xdp_adjust_tail can now grow packet length, add checks to make sure that grow size is within the DMA-mapped size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/158945342911.97035.11214251236208648808.stgit@firesoul
2020-05-14net: thunderx: Add XDP frame sizeJesper Dangaard Brouer1-0/+1
To help reviewers these are the defines related to RCV_FRAG_LEN #define DMA_BUFFER_LEN 1536 /* In multiples of 128bytes */ #define RCV_FRAG_LEN (SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Sunil Goutham <sgoutham@marvell.com> Cc: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/bpf/158945342402.97035.12649844447148990032.stgit@firesoul
2020-05-14mlx4: Add XDP frame size and adjust max XDP MTUJesper Dangaard Brouer2-1/+3
The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945341893.97035.2688142527052329942.stgit@firesoul
2020-05-14ena: Add XDP frame size to amazon NIC driverJesper Dangaard Brouer2-2/+4
Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account the reserved tailroom. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Sameeh Jubran <sameehj@amazon.com> Cc: Arthur Kiyanovski <akiyano@amazon.com> Link: https://lore.kernel.org/bpf/158945341384.97035.907403694833419456.stgit@firesoul
2020-05-14net: ethernet: ti: Add XDP frame size to driver cpswJesper Dangaard Brouer2-0/+2
The driver code cpsw.c and cpsw_new.c both use page_pool with default order-0 pages or their RX-pages. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/158945340875.97035.752144756428532878.stgit@firesoul
2020-05-14qlogic/qede: Add XDP frame size to driverJesper Dangaard Brouer2-1/+2
The driver qede uses a full page, when XDP is enabled. The drivers value in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an XDP bpf_prog is attached. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ariel Elior <aelior@marvell.com> Cc: GR-everest-linux-l2@marvell.com Link: https://lore.kernel.org/bpf/158945340366.97035.7764939691580349618.stgit@firesoul
2020-05-14hv_netvsc: Add XDP frame size to driverJesper Dangaard Brouer2-1/+2
The hyperv NIC driver does memory allocation and copy even without XDP. In XDP mode it will allocate a new page for each packet and copy over the payload, before invoking the XDP BPF-prog. The positive thing it that its easy to determine the xdp.frame_sz. The XDP implementation for hv_netvsc transparently passes xdp_prog to the associated VF NIC. Many of the Azure VMs are using SRIOV, so majority of the data are actually processed directly on the VF driver's XDP path. So the overhead of the synthetic data path (hv_netvsc) is minimal. Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the SKB via build_skb (based on the newly allocated page). Now using XDP frame_sz this will provide more skb_tailroom, which netstack can use for SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce). V3: Adjust patch desc to be more positive. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Link: https://lore.kernel.org/bpf/158945339857.97035.10212138582505736163.stgit@firesoul
2020-05-14dpaa2-eth: Add XDP frame sizeJesper Dangaard Brouer1-0/+7
The dpaa2-eth driver reserve some headroom used for hardware and software annotation area in RX/TX buffers. Thus, xdp.data_hard_start doesn't start at page boundary. When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is 448 bytes of which XDP have reserved 256 bytes. As frame_sz is calculated as an offset from xdp_buff.data_hard_start, an adjust from the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE. When doing XDP_REDIRECT, the driver doesn't need this reserved headroom any-longer and allows xdp_do_redirect() to use it. This is an advantage for the drivers own ndo-xdp_xmit, as it uses part of this headroom for itself. Patch also adjust frame_sz in this case. The driver cannot support XDP data_meta, because it uses the headroom just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64), when transmitting the packet. When transmitting a xdp_frame in dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to store a pointer to xdp_frame and dma_size, which is used in TX completion (free_tx_fd) to return frame via xdp_return_frame(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Link: https://lore.kernel.org/bpf/158945339348.97035.8562488847066908856.stgit@firesoul
2020-05-14veth: Xdp using frame_sz in veth driverJesper Dangaard Brouer1-9/+13
The veth driver can run XDP in "native" mode in it's own NAPI handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") packets can come in two forms either xdp_frame or skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb(). For packets to arrive in xdp_frame format, they will have been redirected from an XDP native driver. In case of XDP_PASS or no XDP-prog attached, the veth driver will allocate and create an SKB. The current code in veth_xdp_rcv_one() xdp_frame case, had to guess the frame truesize of the incoming xdp_frame, when using veth_build_skb(). With xdp_frame->frame_sz this is not longer necessary. Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done similar to the XDP-generic handling code in net/core/dev.c. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://lore.kernel.org/bpf/158945338840.97035.935897116345700902.stgit@firesoul
2020-05-14veth: Adjust hard_start offset on redirect XDP framesJesper Dangaard Brouer1-4/+4
When native XDP redirect into a veth device, the frame arrives in the xdp_frame structure. It is then processed in veth_xdp_rcv_one(), which can run a new XDP bpf_prog on the packet. Doing so requires converting xdp_frame to xdp_buff, but the tricky part is that xdp_frame memory area is located in the top (data_hard_start) memory area that xdp_buff will point into. The current code tried to protect the xdp_frame area, by assigning xdp_buff.data_hard_start past this memory. This results in 32 bytes less headroom to expand into via BPF-helper bpf_xdp_adjust_head(). This protect step is actually not needed, because BPF-helper bpf_xdp_adjust_head() already reserve this area, and don't allow BPF-prog to expand into it. Thus, it is safe to point data_hard_start directly at xdp_frame memory area. Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945338331.97035.5923525383710752178.stgit@firesoul
2020-05-14xdp: Cpumap redirect use frame_sz and increase skb_tailroomJesper Dangaard Brouer1-18/+3
Knowing the memory size backing the packet/xdp_frame data area, and knowing it already have reserved room for skb_shared_info, simplifies using build_skb significantly. With this change we no-longer lie about the SKB truesize, but more importantly a significant larger skb_tailroom is now provided, e.g. when drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce, see TCP cases where tcp_queue_rcv() can 'eat' skb). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337822.97035.13557959180460986059.stgit@firesoul
2020-05-14xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frameJesper Dangaard Brouer2-1/+21
Use hole in struct xdp_frame, when adding member frame_sz, which keeps same sizeof struct (32 bytes) Drivers ixgbe and sfc had bug cases where the necessary/expected tailroom was not reserved. This can lead to some hard to catch memory corruption issues. Having the drivers frame_sz this can be detected when packet length/end via xdp->data_end exceed the xdp_data_hard_end pointer, which accounts for the reserved the tailroom. When detecting this driver issue, simply fail the conversion with NULL, which results in feedback to driver (failing xdp_do_redirect()) causing driver to drop packet. Given the lack of consistent XDP stats, this can be hard to troubleshoot. And given this is a driver bug, we want to generate some more noise in form of a WARN stack dump (to ID the driver code that inlined convert_to_xdp_frame). Inlining the WARN macro is problematic, because it adds an asm instruction (on Intel CPUs ud2) what influence instruction cache prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this and at the same time make identifying the function and line of this inlined function easier. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul
2020-05-14net: XDP-generic determining XDP frame sizeJesper Dangaard Brouer1-6/+8
The SKB "head" pointer points to the data area that contains skb_shared_info, that can be found via skb_end_pointer(). Given xdp->data_hard_start have been established (basically pointing to skb->head), frame size is between skb_end_pointer() and data_hard_start, plus the size reserved to skb_shared_info. Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive offset number on grow, and negative number on shrink. As this seems more natural when reading the code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945336804.97035.7164852191163722056.stgit@firesoul
2020-05-14net: netsec: Add support for XDP frame sizeIlias Apalodimas1-12/+18
This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length *after* bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/bpf/158945336295.97035.15034759661036971024.stgit@firesoul
2020-05-14mvneta: Add XDP frame size to driverJesper Dangaard Brouer1-10/+15
This marvell driver mvneta uses PAGE_SIZE frames, which makes it really easy to convert. Driver updates rxq and now frame_sz once per NAPI call. This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length *after* bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Cc: thomas.petazzoni@bootlin.com Link: https://lore.kernel.org/bpf/158945335786.97035.12714388304493736747.stgit@firesoul
2020-05-14sfc: Add XDP frame sizeJesper Dangaard Brouer1-0/+1
This driver uses RX page-split when possible. It was recently fixed in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to add needed tailroom for XDP-redirect. After the fix efx->rx_page_buf_step is the frame size, with enough head and tail-room for XDP-redirect. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158945335278.97035.14611425333184621652.stgit@firesoul
2020-05-14bnxt: Add XDP frame size to driverJesper Dangaard Brouer1-0/+1
This driver uses full PAGE_SIZE pages when XDP is enabled. In case of XDP uses driver uses __bnxt_alloc_rx_page which does full page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX action that does DMA-sync. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com> Link: https://lore.kernel.org/bpf/158945334769.97035.13437970179897613984.stgit@firesoul
2020-05-14xdp: Add frame size to xdp_buffJesper Dangaard Brouer1-0/+13
XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. The frame size (frame_sz) member of xdp_buff is introduced to know the real size of the memory the frame is delivered in. When introducing this also make it clear that some tailroom is reserved/required when creating SKBs using build_skb(). It would also have been an option to introduce a pointer to data_hard_end (with reserved offset). The advantage with frame_sz is that (like rxq) drivers only need to setup/assign this value once per NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to store frame_sz inside xdp_rxq_info, because it's varies per packet as it can be based/depend on packet length. V2: nitpick: deduct -> deduce Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945334261.97035.555255657490688547.stgit@firesoul
2020-05-14selftests/bpf: Test for sk helpers in cgroup skbAndrey Ignatov2-0/+192
Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and bpf_sk_ancestor_cgroup_id helpers from cgroup skb program. The test creates a testing cgroup, starts a TCPv6 server inside the cgroup and creates two client sockets: one inside testing cgroup and one outside. Then it attaches cgroup skb program to the cgroup that checks all TCP segments coming to the server and allows only those coming from the cgroup of the server. If a segment comes from a peer outside of the cgroup, it'll be dropped. Finally the test checks that client from inside testing cgroup can successfully connect to the server, but client outside the cgroup fails to connect by timeout. The main goal of the test is to check newly introduced bpf_sk_{,ancestor_}cgroup_id helpers. It also checks a couple of socket lookup helpers (tcp & release), but lookup helpers were introduced much earlier and covered by other tests. Here it's mostly checked that they can be called from cgroup skb. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
2020-05-14selftests/bpf: Add connect_fd_to_fd, connect_wait net helpersAndrey Ignatov2-13/+63
Add two new network helpers. connect_fd_to_fd connects an already created client socket fd to address of server fd. Sometimes it's useful to separate client socket creation and connecting this socket to a server, e.g. if client socket has to be created in a cgroup different from that of server cgroup. Additionally connect_to_fd is now implemented using connect_fd_to_fd, both helpers don't treat EINPROGRESS as an error and let caller decide how to proceed with it. connect_wait is a helper to work with non-blocking client sockets so that if connect_to_fd or connect_fd_to_fd returned -1 with errno == EINPROGRESS, caller can wait for connect to finish or for connection timeout. The helper returns -1 on error, 0 on timeout (1sec, hard-coded), and positive number on success. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
2020-05-14bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpersAndrey Ignatov3-11/+121
With having ability to lookup sockets in cgroup skb programs it becomes useful to access cgroup id of retrieved sockets so that policies can be implemented based on origin cgroup of such socket. For example, a container running in a cgroup can have cgroup skb ingress program that can lookup peer socket that is sending packets to a process inside the container and decide whether those packets should be allowed or denied based on cgroup id of the peer. More specifically such ingress program can implement intra-host policy "allow incoming packets only from this same container and not from any other container on same host" w/o relying on source IP addresses since quite often it can be the case that containers share same IP address on the host. Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id(). These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id helpers with the only difference that sk is used to get cgroup id instead of skb, and share code with them. See documentation in UAPI for more details. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
2020-05-14bpf: Allow skb_ancestor_cgroup_id helper in cgroup skbAndrey Ignatov1-0/+2
cgroup skb programs already can use bpf_skb_cgroup_id. Allow bpf_skb_ancestor_cgroup_id as well so that container policies can be implemented for a container that can have sub-cgroups dynamically created, but policies should still be implemented based on cgroup id of container itself not on an id of a sub-cgroup. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/8874194d6041eba190356453ea9f6071edf5f658.1589486450.git.rdna@fb.com
2020-05-14bpf: Allow sk lookup helpers in cgroup skbAndrey Ignatov1-0/+8
Currently sk lookup helpers are allowed in tc, xdp, sk skb, and cgroup sock_addr programs. But they would be useful in cgroup skb as well so that for example cgroup skb ingress program can lookup a peer socket a packet comes from on same host and make a decision whether to allow or deny this packet based on the properties of that socket, e.g. cgroup that peer socket belongs to. Allow the following sk lookup helpers in cgroup skb: * bpf_sk_lookup_tcp; * bpf_sk_lookup_udp; * bpf_sk_release; * bpf_skc_lookup_tcp. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f8c7ee280f1582b586629436d777b6db00597d63.1589486450.git.rdna@fb.com
2020-05-14selftest/bpf: Fix spelling mistake "SIGALARM" -> "SIGALRM"Colin Ian King1-1/+1
There is a spelling mistake in an error message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com
2020-05-14bpf: Fix bpf_iter's task iterator logicAndrii Nakryiko1-1/+7
task_seq_get_next might stop prematurely if get_pid_task() fails to get task_struct. Failure to do so doesn't mean that there are no more tasks with higher pids. Procfs's iteration algorithm (see next_tgid in fs/proc/base.c) does a retry in such case. After this fix, instead of stopping prematurely after about 300 tasks on my server, bpf_iter program now returns >4000, which sounds much closer to reality. Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514055137.1564581-1-andriin@fb.com
2020-05-14selftests/bpf: Test narrow loads for bpf_sock_addr.user_portAndrey Ignatov1-10/+28
Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr programs. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
2020-05-14bpf: Support narrow loads from bpf_sock_addr.user_portAndrey Ignatov3-10/+9
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly code in BPF programs, like: volatile __u32 user_port = ctx->user_port; __u16 port = bpf_ntohs(user_port); Since otherwise clang may optimize the load to be 2-byte and it's rejected by verifier. Add support for 1- and 2-byte loads same way as it's supported for other fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
2020-05-14samples/bpf: xdp_redirect_cpu: Set MAX_CPUS according to NR_CPUSLorenzo Bianconi2-14/+17
xdp_redirect_cpu is currently failing in bpf_prog_load_xattr() allocating cpu_map map if CONFIG_NR_CPUS is less than 64 since cpu_map_alloc() requires max_entries to be less than NR_CPUS. Set cpu_map max_entries according to NR_CPUS in xdp_redirect_cpu_kern.c and get currently running cpus in xdp_redirect_cpu_user.c Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/374472755001c260158c4e4b22f193bdd3c56fb7.1589300442.git.lorenzo@kernel.org
2020-05-14MAINTAINERS: Mark networking drivers as Maintained.David S. Miller1-1/+1
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14r8169: don't include linux/moduleparam.hHeiner Kallweit1-1/+0
93882c6f210a ("r8169: switch from netif_xxx message functions to netdev_xxx") removed the last module parameter from the driver, therefore there's no need any longer to include linux/moduleparam.h. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14r8169: remove not needed checks in rtl8169_set_eeeHeiner Kallweit1-6/+0
After 9de5d235b60a ("net: phy: fix aneg restart in phy_ethtool_set_eee") we don't need the check for aneg being enabled any longer, and as discussed with Russell configuring the EEE advertisement should be supported even if we're in a half-duplex mode currently. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: dsa: felix: fix incorrect clamp calculation for burstColin Ian King1-1/+1
Currently burst is clamping on rate and not burst, the assignment of burst from the clamping discards the previous assignment of burst. This looks like a cut-n-paste error from the previous clamping calculation on ramp. Fix this by replacing ramp with burst. Addresses-Coverity: ("Unused value") Fixes: 0fbabf875d18 ("net: dsa: felix: add support Credit Based Shaper(CBS) for hardware offload") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14ipmr: Add lockdep expression to ipmr_for_each_table macroAmol Grover1-3/+4
During the initialization process, ipmr_new_table() is called to create new tables which in turn calls ipmr_get_table() which traverses net->ipv4.mr_tables without holding the writer lock. However, this is safe to do so as no tables exist at this time. Hence add a suitable lockdep expression to silence the following false-positive warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-next-20200428-syzkaller #0 Not tainted ----------------------------- net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!! ipmr_get_table+0x130/0x160 net/ipv4/ipmr.c:136 ipmr_new_table net/ipv4/ipmr.c:403 [inline] ipmr_rules_init net/ipv4/ipmr.c:248 [inline] ipmr_net_init+0x133/0x430 net/ipv4/ipmr.c:3089 Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Reported-by: syzbot+1519f497f2f9f08183c6@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14ipmr: Fix RCU list debugging warningAmol Grover1-1/+2
ipmr_for_each_table() macro uses list_for_each_entry_rcu() for traversing outside of an RCU read side critical section but under the protection of rtnl_mutex. Hence, add the corresponding lockdep expression to silence the following false-positive warning at boot: [ 4.319347] ============================= [ 4.319349] WARNING: suspicious RCU usage [ 4.319351] 5.5.4-stable #17 Tainted: G E [ 4.319352] ----------------------------- [ 4.319354] net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!! Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: phy: mdio-moxart: remove unneeded includeBartosz Golaszewski1-1/+0
mdio-moxart doesn't use regulators in the driver code. We can remove the regulator include. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14dt-bindings: dp83867: Convert DP83867 to yamlDan Murphy2-68/+127
Convert the dp83867 binding to yaml. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14dt-bindings: net: dp83869: Update licensing infoDan Murphy1-1/+1
Add BSD 2 Clause to the licensing. CC: Rob Herring <robh@kernel.org> Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.cMadhuparna Bhowmik1-1/+2
This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc5-next-20200514-syzkaller #0 Not tainted ----------------------------- drivers/net/hamradio/bpqether.c:149 RCU-list traversed in non-reader section!! Since rtnl lock is held, pass this cond in list_for_each_entry_rcu(). Reported-by: syzbot+bb82cafc737c002d11ca@syzkaller.appspotmail.com Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810Kevin Lo2-2/+7
Set the correct bit when checking for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54810 PHY. Fixes: 0ececcfc9267 ("net: phy: broadcom: Allow BCM54810 to use bcm54xx_adjust_rxrefclk()") Signed-off-by: Kevin Lo <kevlo@kevlo.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14hinic: update huawei ethernet driver maintainerLuo bin1-1/+1
update huawei ethernet driver maintainer from aviad to Bin luo Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14hinic: add set_ringparam ethtool_ops supportLuo bin11-15/+104
support to change TX/RX queue depth with ethtool -G Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14devlink: refactor end checks in devlink_nl_cmd_region_read_dumpitJakub Kicinski2-25/+31
Clean up after recent fixes, move address calculations around and change the variable init, so that we can have just one start_offset == end_offset check. Make the check a little stricter to preserve the -EINVAL error if requested start offset is larger than the region itself. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>