aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/virtio_net.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-05-23virtio-net: correctly transmit XDP buff after linearizingJason Wang1-1/+1
We should not go for the error path after successfully transmitting a XDP buffer after linearizing. Since the error path may try to pop and drop next packet and increase the drop counters. Fixing this by simply drop the refcnt of original page and go for xmit path. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23virtio-net: correctly redirect linearized packetJason Wang1-1/+1
After a linearized packet was redirected by XDP, we should not go for the err path which will try to pop buffers for the next packet and increase the drop counter. Fixing this by just drop the page refcnt for the original page. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23bpf: fix virtio-net's length calc for XDP_PASSNikita V. Shirokov1-1/+1
In commit 6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail") i didn't account for vi->hdr_len during new packet's length calculation after bpf_prog_run in receive_mergeable. because of this all packets, if they were passed to the kernel, were truncated by 12 bytes. Fixes:6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-31/+48
Conflicts were simple overlapping changes in microchip driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: sparse annotation fixMichael S. Tsirkin1-1/+1
offloads is a buffer in virtio format, should use the __virtio64 tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: fix adding vids on big-endianMichael S. Tsirkin1-3/+3
Programming vids (adding or removing them) still passes guest-endian values in the DMA buffer. That's wrong if guest is big-endian and when virtio 1 is enabled. Note: this is on top of a previous patch: virtio_net: split out ctrl buffer Fixes: 9465a7a6f ("virtio_net: enable v1.0 support") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: split out ctrl bufferMichael S. Tsirkin1-29/+39
When sending control commands, virtio net sets up several buffers for DMA. The buffers are all part of the net device which means it's actually allocated by kvmalloc so it's in theory (on extreme memory pressure) possible to get a vmalloc'ed buffer which on some platforms means we can't DMA there. Fix up by moving the DMA buffers into a separate structure. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18bpf: make virtio compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov1-1/+6
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for virtio driver we need to adjust XDP_PASS handling by recalculating length of the packet if it was passed to the TCP/IP stack Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-17xdp: transition into using xdp_frame for ndo_xdp_xmitJesper Dangaard Brouer1-10/+14
Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17xdp: transition into using xdp_frame for return APIJesper Dangaard Brouer1-1/+1
Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17xdp: rhashtable with allocator ID to pointer mappingJesper Dangaard Brouer1-0/+7
Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17virtio_net: convert to use generic xdp_frame and xdp_return_frame APIJesper Dangaard Brouer1-25/+29
The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP data pointer address and uses virt_to_head_page() to retrieve struct page. Use the XDP return API to get away from such assumptions. Instead queue an xdp_frame, which allow us to use the xdp_return_frame API, when releasing the frame. V8: Avoid endianness issues (found by kbuild test robot) V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-13virtio-net: add missing virtqueue kick when flushing packetsJason Wang1-2/+9
We tends to batch submitting packets during XDP_TX. This requires to kick virtqueue after a batch, we tried to do it through xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So explicitly kick the virtqueue in this case. Reported-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Tested-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Cc: Daniel Borkmann <daniel@iogearbox.net> Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUSJay Vosburgh1-1/+1
The operstate update logic will leave an interface in the default UNKNOWN operstate if the interface carrier state never changes from the default carrier up state set at creation. This includes the case of an explicit call to netif_carrier_on, as the carrier on to on transition has no effect on operstate. This affects virtio-net for the case that the virtio peer does not support VIRTIO_NET_F_STATUS (the feature that provides carrier state updates). Without this feature, the virtio specification states that "the link should be assumed active," so, logically, the operstate should be UP instead of UNKNOWN. This has impact on user space applications that use the operstate to make availability decisions for the interface. Resolve this by changing the virtio probe logic slightly to call netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS cases, and then the existing call to netif_carrier_on for the "without" case will cause an operstate transition. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04virtio-net: re enable XDP_REDIRECT for mergeable bufferJason Wang1-12/+42
XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28virtio-net: disable NAPI only when enabled during XDP setJason Wang1-3/+5
We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before. Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDPJesper Dangaard Brouer1-1/+11
When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call. It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not allocated the needed XDP TX queues yet. This is the case for virtio_net, which first allocates the XDP TX queues once an XDP/bpf prog is attached (in virtnet_xdp_set()). Thus, a crash will occur for virtio_net when redirecting to another virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog. The sample xdp_redirect_map tries to attach a dummy XDP prog to take this into account, but it can also easily fail if the virtio_net (or actually underlying vhost driver) have not allocated enough extra queues for the device. Allocating more queue this is currently a manual config. Hint for libvirt XML add: <driver name='vhost' queues='16'> <host mrg_rxbuf='off'/> <guest tso4='off' tso6='off' ecn='off' ufo='off'/> </driver> The solution in this patch is to check that the device have loaded an XDP/bpf prog before proceeding. This is similar to the check performed in driver ixgbe. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21virtio_net: fix memory leak in XDP_REDIRECTJesper Dangaard Brouer1-15/+22
XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons (which can be inspected by tracepoints). The current semantics is that on failure the driver calling xdp_do_redirect() must handle freeing or recycling the page associated with this frame. This can be seen as an optimization, as drivers usually have an optimized XDP_DROP code path for frame recycling in place already. The virtio_net driver didn't handle when xdp_do_redirect() failed. This caused a memory leak as the page refcnt wasn't decremented on failures. The function __virtnet_xdp_xmit() did handle one type of failure, when the xmit queue virtqueue_add_outbuf() is full, which "hides" releasing a refcnt on the page. Instead the function __virtnet_xdp_xmit() must follow API of xdp_do_redirect(), which on errors leave it up to the caller to free the page, of the failed send operation. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21virtio_net: fix XDP code path in receive_small()Jesper Dangaard Brouer1-1/+1
When configuring virtio_net to use the code path 'receive_small()', in-order to get correct XDP_REDIRECT support, I discovered TCP packets would get silently dropped when loading an XDP program action XDP_PASS. The bug seems to be that receive_small() when XDP is loaded check that hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the flags VIRTIO_NET_HDR_F_* : #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */ #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */ TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set. The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags stored in hdr->hdr.gso_type. Thus, the fix is just check that none of the gso_type flags have been set. Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21virtio_net: disable XDP_REDIRECT in receive_mergeable() caseJesper Dangaard Brouer1-7/+0
The virtio_net code have three different RX code-paths in receive_buf(). Two of these code paths can handle XDP, but one of them is broken for at least XDP_REDIRECT. Function(1): receive_big() does not support XDP. Function(2): receive_small() support XDP fully and uses build_skb(). Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb(). The simple explanation is that receive_mergeable() is broken because it uses napi_alloc_skb(), which violates XDP given XDP assumes packet header+data in single page and enough tail room for skb_shared_info. The longer explaination is that receive_mergeable() tries to work-around and satisfy these XDP requiresments e.g. by having a function xdp_linearize_page() that allocates and memcpy RX buffers around (in case packet is scattered across multiple rx buffers). This does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because we have not implemented bpf_xdp_adjust_tail yet). The XDP_REDIRECT action combined with cpumap is broken, and cause hard to debug crashes. The main issue is that the RX packet does not have the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing skb_shared_info to overlap the next packets head-room (in which cpumap stores info). Reproducing depend on the packet payload length and if RX-buffer size happened to have tail-room for skb_shared_info or not. But to make this even harder to troubleshoot, the RX-buffer size is runtime dynamically change based on an Exponentially Weighted Moving Average (EWMA) over the packet length, when refilling RX rings. This patch only disable XDP_REDIRECT support in receive_mergeable() case, because it can cause a real crash. IMHO we should consider NOT supporting XDP in receive_mergeable() at all, because the principles behind XDP are to gain speed by (1) code simplicity, (2) sacrificing memory and (3) where possible moving runtime checks to setup time. These principles are clearly being violated in receive_mergeable(), that e.g. runtime track average buffer size to save memory consumption. In the longer run, we should consider introducing a separate receive function when attaching an XDP program, and also change the memory model to be compatible with XDP when attaching an XDP prog. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22virtio_net: Add ethtool statsToshiaki Makita1-50/+141
The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment. $ ethtool -S ens10 NIC statistics: rx_queue_0_packets: 2090408 rx_queue_0_bytes: 3164825094 rx_queue_1_packets: 2082531 rx_queue_1_bytes: 3152932314 tx_queue_0_packets: 2770841 tx_queue_0_bytes: 4194955474 tx_queue_1_packets: 3084697 tx_queue_1_bytes: 4670196372 This change converts existing per-cpu stats structure into per-queue one. This should not impact on performance since each queue counter is not updated concurrently by multiple cpus. Performance numbers: - Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf Before After Diff UDP_STREAM 18byte 86.22 87.00 +0.90% UDP_STREAM 1472byte 4055.27 4042.18 -0.32% TCP_STREAM 16956.32 16890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71% Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09virtio_net: propagate linkspeed/duplex settings from the hypervisorJason Baron1-1/+22
The ability to set speed and duplex for virtio_net is useful in various scenarios as described here: 16032be virtio_net: add ethtool support for set and get of settings However, it would be nice to be able to set this from the hypervisor, such that virtio_net doesn't require custom guest ethtool commands. Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows the hypervisor to export a linkspeed and duplex setting. The user can subsequently overwrite it later if desired via: 'ethtool -s'. Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05virtio_net: setup xdp_rxq_infoJesper Dangaard Brouer1-1/+13
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+1
Pull virtio bugfixes from Michael Tsirkin: "A couple of minor bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: fix return value check in receive_mergeable() virtio_mmio: add cleanup for virtio_mmio_remove virtio_mmio: add cleanup for virtio_mmio_probe
2017-12-08virtio_net: Disable interrupts if napi_complete_done rescheduled napiToshiaki Makita1-3/+6
Since commit 39e6c8208d7b ("net: solve a NAPI race") napi has been able to be rescheduled within napi_complete_done() even in non-busypoll case, but virtnet_poll() always enabled interrupts before complete, and when napi was rescheduled within napi_complete_done() it did not disable interrupts. This caused more interrupts when event idx is disabled. According to commit cbdadbbf0c79 ("virtio_net: fix race in RX VQ processing") we cannot place virtqueue_enable_cb_prepare() after NAPI_STATE_SCHED is cleared, so disable interrupts again if napi_complete_done() returned false. Tested with vhost-user of OVS 2.7 on host, which does not have the event idx feature. * Before patch: $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32763206 0 6430.32 212992 60.00 23384299 4589.56 Interrupts on guest: 9872369 Packets/interrupt: 2.37 * After patch $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32794646 0 6436.49 212992 60.00 32793501 6436.27 Interrupts on guest: 4941299 Packets/interrupt: 6.64 Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07virtio_net: fix return value check in receive_mergeable()Yunjian Wang1-1/+1
The function virtqueue_get_buf_ctx() could return NULL, the return value 'buf' need to be checked with NULL, not value 'ctx'. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+0
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-15mm: remove __GFP_COLDMel Gorman1-1/+0
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski1-2/+2
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: add meta pointer for direct accessDaniel Borkmann1-0/+2
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-22virtio-net: correctly set xdp_xmit for mergeable bufferJason Wang1-1/+1
We should set xdp_xmit only when xdp_do_redirect() succeed. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20virtio-net: support XDP_REDIRECTJason Wang1-15/+62
This patch tries to add XDP_REDIRECT for virtio-net. The changes are not complex as we could use exist XDP_TX helpers for most of the work. The rest is passing the XDP_TX to NAPI handler for implementing batching. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20virtio-net: add packet len average only when needed during XDPJason Wang1-3/+3
There's no need to add packet len average in the case of XDP_PASS since it will be done soon after skb is created. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20virtio-net: remove unnecessary parameter of virtnet_xdp_xmit()Jason Wang1-3/+2
CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24virtio_net: be drop monitor friendlyEric Dumazet1-1/+1
This change is needed to not fool drop monitor. (perf record ... -e skb:kfree_skb ) Packets were properly sent and are consumed after TX completion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18net: drop unused attribute argument from sysfs queue funcsstephen hemminger1-1/+1
The show and store functions don't need/use the attribute. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16virtio: put paren around sizeofstephen hemminger1-1/+1
Kernel coding style is to put paren around operand of sizeof. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13virtio-net: make array guest_offloads staticColin Ian King1-4/+6
The array guest_offloads is local to the source and does not need to be in global scope, so make it static. Also tweak formatting. Cleans up sparse warnings: symbol 'guest_offloads' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+3
Two minor conflicts in virtio_net driver (bug fix overlapping addition of a helper) and MAINTAINERS (new driver edit overlapping revamp of PHY entry). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-3/+2
Pull networking fixes from David Miller: 1) Handle notifier registry failures properly in tun/tap driver, from Tonghao Zhang. 2) Fix bpf verifier handling of subtraction bounds and add a testcase for this, from Edward Cree. 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt. 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET, from Cong Wang. 5) Fix SElinux regression due to recent UDP optimizations, from Paolo Abeni. 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code paths, fix from Stefano Brivio. 7) Fix some mem leaks in dccp, from Xin Long. 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd Bergmann. 9) Mac address length check and buffer size fixes from Cong Wang. 10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni. 11) Fix return value when copy_from_user() fails in bpf_prog_get_info_by_fd(), from Daniel Borkmann. 12) Handle PHY_HALTED properly in phy library state machine, from Florian Fainelli. 13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel. 14) Fix truesize calculation in virtio_net which led to performance regressions, from Michael S Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) samples/bpf: fix bpf tunnel cleanup udp6: fix jumbogram reception ppp: Fix a scheduling-while-atomic bug in del_chan Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" virtio_net: fix truesize for mergeable buffers mv643xx_eth: fix of_irq_to_resource() error check MAINTAINERS: Add more files to the PHY LIBRARY section ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() net: phy: Correctly process PHY_HALTED in phy_stop_machine() sunhme: fix up GREG_STAT and GREG_IMASK register offsets bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len tcp: avoid bogus gcc-7 array-bounds warning net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt" bpf: don't indicate success when copy_from_user fails udp6: fix socket leak on early demux net: thunderx: Fix BGX transmit stall due to underflow Revert "vhost: cache used event for better performance" team: use a larger struct for mac address net: check dev->addr_len for dev_set_mac_address() phy: bcm-ns-usb3: fix MDIO_BUS dependency ...
2017-07-31virtio_net: fix truesize for mergeable buffersMichael S. Tsirkin1-3/+2
Seth Forshee noticed a performance degradation with some workloads. This turns out to be due to packet drops. Euan Kemp noticed that this is because we drop all packets where length exceeds the truesize, but for some packets we add in extra memory without updating the truesize. This in turn was kept around unchanged from ab7db91705e95 ("virtio-net: auto-tune mergeable rx buffer size for improved performance"). That commit had an internal reason not to account for the extra space: not enough bits to do it. No longer true so let's account for the allocated length exactly. Many thanks to Seth Forshee for the report and bisecting and Euan Kemp for debugging the issue. Fixes: 680557cf79f8 ("virtio_net: rework mergeable buffer handling") Reported-by: Euan Kemp <euan.kemp@coreos.com> Tested-by: Euan Kemp <euan.kemp@coreos.com> Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25virtio-net: mark PM functions as __maybe_unusedArnd Bergmann1-4/+2
After removing the reset function, the freeze and restore functions are now unused when CONFIG_PM_SLEEP is disabled: drivers/net/virtio_net.c:1881:12: error: 'virtnet_restore_up' defined but not used [-Werror=unused-function] static int virtnet_restore_up(struct virtio_device *vdev) drivers/net/virtio_net.c:1859:13: error: 'virtnet_freeze_down' defined but not used [-Werror=unused-function] static void virtnet_freeze_down(struct virtio_device *vdev) A more robust way to do this is to remove the #ifdef around the callers and instead mark them as __maybe_unused. The compiler will now just silently drop the unused code. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25virtio-net: fix module unloadingAndrew Jones1-1/+1
Unregister the driver before removing multi-instance hotplug callbacks. This order avoids the warning issued from __cpuhp_remove_state_cpuslocked when the number of remaining instances isn't yet zero. Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-24virtio-net: switch off offloads on demand if possible on XDP setJason Wang1-5/+65
Current XDP implementation wants guest offloads feature to be disabled on device. This is inconvenient and means guest can't benefit from offloads if XDP is not used. This patch tries to address this limitation by disabling the offloads on demand through control guest offloads. Guest offloads will be disabled and enabled on demand on XDP set. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: do not reset during XDP setJason Wang1-126/+106
We currently reset the device during XDP set, the main reason is that we allocate more headroom with XDP (for header adjustment). This works but causes network downtime for users. Previous patches encoded the headroom in the buffer context, this makes it possible to detect the case where a buffer with headroom insufficient for XDP is added to the queue and XDP is enabled afterwards. Upon detection, we handle this case by copying the packet (slow, but it's a temporary condition). Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: switch to use new ctx API for small bufferJason Wang1-5/+12
Use ctx API to store headroom for small buffers. Following patches will retrieve this info and use it for XDP. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: pack headroom into ctx for mergeable buffersJason Wang1-5/+24
Pack headroom into ctx - this way when we get a buffer we can figure out the actual headroom that was allocated for the buffer. Will be helpful to optimize switching between XDP and non-XDP modes which have different headroom requirements. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17virtio_net: Remove references to NETIF_F_UFO.David S. Miller1-4/+2
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>