aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/xen-netback/netback.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2014-10-29xen-netback: Remove __GFP_COLDZoltan Kiss1-1/+1
This flag is unnecessary, it came from some old code. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25xen-netback: reintroduce guest Rx stall detectionDavid Vrabel1-0/+76
If a frontend not receiving packets it is useful to detect this and turn off the carrier so packets are dropped early instead of being queued and drained when they expire. A to-guest queue is stalled if it doesn't have enough free slots for a an extended period of time (default 60 s). If at least one queue is stalled, the carrier is turned off (in the expectation that the other queues will soon stall as well). The carrier is only turned on once all queues are ready. When the frontend connects, all the queues start in the stalled state and only become ready once the frontend queues enough Rx requests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25xen-netback: fix unlimited guest Rx internal queue and carrier flappingDavid Vrabel1-118/+125
Netback needs to discard old to-guest skb's (guest Rx queue drain) and it needs detect guest Rx stalls (to disable the carrier so packets are discarded earlier), but the current implementation is very broken. 1. The check in hard_start_xmit of the slot availability did not consider the number of packets that were already in the guest Rx queue. This could allow the queue to grow without bound. The guest stops consuming packets and the ring was allowed to fill leaving S slot free. Netback queues a packet requiring more than S slots (ensuring that the ring stays with S slots free). Netback queue indefinately packets provided that then require S or fewer slots. 2. The Rx stall detection is not triggered in this case since the (host) Tx queue is not stopped. 3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback will consider this an Rx purge event which may result in it taking the carrier down unnecessarily. It also considers a queue with only 1 slot free as unstalled (even though the next packet might not fit in this). The internal guest Rx queue is limited by a byte length (to 512 Kib, enough for half the ring). The (host) Tx queue is stopped and started based on this limit. This sets an upper bound on the amount of memory used by packets on the internal queue. This allows the estimatation of the number of slots for an skb to be removed (it wasn't a very good estimate anyway). Instead, the guest Rx thread just waits for enough free slots for a maximum sized packet. skbs queued on the internal queue have an 'expires' time (set to the current time plus the drain timeout). The guest Rx thread will detect when the skb at the head of the queue has expired and discard expired skbs. This sets a clear upper bound on the length of time an skb can be queued for. For a guest being destroyed the maximum time needed to wait for all the packets it sent to be dropped is still the drain timeout (10 s) since it will not be sending new packets. Rx stall detection is reintroduced in a later commit. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13xen-netback: don't stop dealloc kthread too earlyWei Liu1-7/+19
Reference count the number of packets in host stack, so that we don't stop the deallocation thread too early. If not, we can end up with xenvif_free permanently waiting for deallocation thread to unmap grefs. Reported-by: Thomas Leonard <talex5@gmail.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-07xen-netback: Fix vif->disable handlingZoltan Kiss1-2/+8
In the patch called "xen-netback: Turn off the carrier if the guest is not able to receive" new branches were introduced to this if statement, risking that a queue with non-zero id can reenable the disabled interface. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05xen-netback: Turn off the carrier if the guest is not able to receiveZoltan Kiss1-14/+83
Currently when the guest is not able to receive more packets, qdisc layer starts a timer, and when it goes off, qdisc is started again to deliver a packet again. This is a very slow way to drain the queues, consumes unnecessary resources and slows down other guests shutdown. This patch change the behaviour by turning the carrier off when that timer fires, so all the packets are freed up which were stucked waiting for that vif. Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit to signal the thread that either the timeout happened or an RX interrupt arrived, so the thread can check what it should do. It also disables NAPI, so the guest can't transmit, but leaves the interrupts on, so it can resurrect. Only the queues which brought down the interface can enable it again, the bit QUEUE_STATUS_RX_STALLED makes sure of that. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05xen-netback: Using a new state bit instead of carrierZoltan Kiss1-1/+1
This patch introduces a new state bit VIF_STATUS_CONNECTED to track whether the vif is in a connected state. Using carrier will not work with the next patch in this series, which aims to turn the carrier temporarily off if the guest doesn't seem to be able to receive packets. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org v2: - rename the bitshift type to "enum state_bit_shift" here, not in the next patch Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20xen-netback: Fix pointer incrementation to avoid incorrect loggingZoltan Kiss1-1/+1
Due to this pointer is increased prematurely, the error log contains rubbish. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20xen-netback: Fix releasing header slot on error pathZoltan Kiss1-5/+33
This patch makes this function aware that the first frag and the header might share the same ring slot. That could happen if the first slot is bigger than PKT_PROT_LEN. Due to this the error path might release that slot twice or never, depending on the error scenario. xenvif_idx_release is also removed from xenvif_idx_unmap, and called separately. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20xen-netback: Fix releasing frag_list skbs in error pathZoltan Kiss1-0/+9
When the grant operations failed, the skb is freed up eventually, and it tries to release the frags, if there is any. For the main skb nr_frags is set to 0 to avoid this, but on the frag_list it iterates through the frags array, and tries to call put_page on the page pointer which contains garbage at that time. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20xen-netback: Fix handling frag_list on grant op error pathZoltan Kiss1-17/+20
The error handling for skb's with frag_list was completely wrong, it caused double unmap attempts to happen if the error was on the first skb. Move it to the right place in the loop. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08xen-netback: Adding debugfs "io_ring_qX" filesZoltan Kiss1-0/+11
This patch adds debugfs capabilities to netback. There used to be a similar patch floating around for classic kernel, but it used procfs. It is based on a very similar blkback patch. It creates xen-netback/[vifname]/io_ring_q[queueno] files, reading them output various ring variables etc. Writing "kick" into it imitates an interrupt happened, it can be useful to check whether the ring is just stalled due to a missed interrupt. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05xen-netback: Fix handling of skbs requiring too many slotsZoltan Kiss1-11/+25
A recent commit (a02eb4 "xen-netback: worse-case estimate in xenvif_rx_action is underestimating") capped the slot estimation to MAX_SKB_FRAGS, but that triggers the next BUG_ON a few lines down, as the packet consumes more slots than estimated. This patch introduces full_coalesce on the skb callback buffer, which is used in start_new_rx_buffer() to decide whether netback needs coalescing more aggresively. By doing that, no packet should need more than (XEN_NETIF_MAX_TX_SIZE + 1) / PAGE_SIZE data slots (excluding the optional GSO slot, it doesn't carry data, therefore irrelevant in this case), as the provided buffers are fully utilized. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04xen-netback: Add support for multiple queuesAndrew J. Bennieston1-0/+8
Builds on the refactoring of the previous patch to implement multiple queues between xen-netfront and xen-netback. Writes the maximum supported number of queues into XenStore, and reads the values written by the frontend to determine how many queues to use. Ring references and event channels are read from XenStore on a per-queue basis and rings are connected accordingly. Also adds code to handle the cleanup of any already initialised queues if the initialisation of a subsequent queue fails. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04xen-netback: Factor queue-specific data into queue structWei Liu1-357/+353
In preparation for multi-queue support in xen-netback, move the queue-specific data from struct xenvif into struct xenvif_queue, and update the rest of the code to use this. Also adds loops over queues where appropriate, even though only one is configured at this point, and uses alloc_netdev_mq() and the corresponding multi-queue netif wake/start/stop functions in preparation for multiple active queues. Finally, implements a trivial queue selection function suitable for ndo_select_queue, which simply returns 0 for a single queue and uses skb_get_hash() to compute the queue index otherwise. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16xen-netback: fix race between napi_complete() and interrupt handlerDavid Vrabel1-2/+2
When the NAPI budget was not all used, xenvif_poll() would call napi_complete() /after/ enabling the interrupt. This resulted in a race between the napi_complete() and the napi_schedule() in the interrupt handler. The use of local_irq_save/restore() avoided by race iff the handler is running on the same CPU but not if it was running on a different CPU. Fix this properly by calling napi_complete() before reenabling interrupts (in the xenvif_napi_schedule_or_enable_irq() call). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15xen-netback: Fix grant ref resolution in RX pathZoltan Kiss1-18/+80
The original series for reintroducing grant mapping for netback had a patch [1] to handle receiving of packets from an another VIF. Grant copy on the receiving side needs the grant ref of the page to set up the op. The original patch assumed (wrongly) that the frags array haven't changed. In the case reported by Sander, the sending guest sent a packet where the linear buffer and the first frag were under PKT_PROT_LEN (=128) bytes. xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the first frag. The receiving side had an off-by-one problem when gathered the grant refs. This patch fixes that by checking whether the actual frag's page pointer is the same as the page in the original frag list. It can handle any kind of changes on the original frags array, like: - removing granted frags from the array at any point - adding local pages to the frags list anywhere - reordering the frags It's optimized to the most common case, when there is 1:1 relation between the frags and the list, plus works optimal when frags are removed from the end or the beginning. [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-04xen-netback: Trivial format string fixZoltan Kiss1-2/+2
There is a "%" after pending_idx instead of ":". Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03xen-netback: Grant copy the header instead of map and memcpyZoltan Kiss1-53/+69
An old inefficiency of the TX path that we are grant mapping the first slot, and then copy the header part to the linear area. Instead, doing a grant copy for that header straight on is more reasonable. Especially because there are ongoing efforts to make Xen avoiding TLB flush after unmap when the page were not touched in Dom0. In the original way the memcpy ruined that. The key changes: - the vif has a tx_copy_ops array again - xenvif_tx_build_gops sets up the grant copy operations - we don't have to figure out whether the header and first frag are on the same grant mapped page or not Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if any) will be on the first frag, which is grant mapped. If the first slot is smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail will pull more from the frags (if any) Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03xen-netback: Rename map opsZoltan Kiss1-22/+24
Rename identifiers to state explicitly that they refer to map ops. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-01xen-netback: disable rogue vif in kthread contextWei Liu1-2/+14
When netback discovers frontend is sending malformed packet it will disables the interface which serves that frontend. However disabling a network interface involving taking a mutex which cannot be done in softirq context, so we need to defer this process to kthread context. This patch does the following: 1. introduce a flag to indicate the interface is disabled. 2. check that flag in TX path, don't do any work if it's true. 3. check that flag in RX path, turn off that interface if it's true. The reason to disable it in RX path is because RX uses kthread. After this change the behavior of netback is still consistent -- it won't do any TX work for a rogue frontend, and the interface will be eventually turned off. Also change a "continue" to "break" after xenvif_fatal_tx_err, as it doesn't make sense to continue processing packets if frontend is rogue. This is a fix for XSA-90. Reported-by: Török Edwin <edwin@etorok.net> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29xen-netback: BUG_ON in xenvif_rx_action() not catching overflowPaul Durrant1-1/+7
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption that meta_slots_used == ring slots used. This is not necessarily the case for GSO packets, because the non-prefix GSO protocol consumes one more ring slot than meta-slot for the 'extra_info'. This patch changes the test to actually check ring slots. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29xen-netback: worse-case estimate in xenvif_rx_action is underestimatingPaul Durrant1-1/+20
The worse-case estimate for skb ring slot usage in xenvif_rx_action() fails to take fragment page_offset into account. The page_offset does, however, affect the number of times the fragmentation code calls start_new_rx_buffer() (i.e. consume another slot) and the worse-case should assume that will always return true. This patch adds the page_offset into the DIV_ROUND_UP for each frag. Unfortunately some frontends aggressively limit the number of requests they post into the shared ring so to avoid an estimate that is 'too' pessimal it is capped at MAX_SKB_FRAGS. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29xen-netback: remove pointless clause from if statementPaul Durrant1-2/+2
This patch removes a test in start_new_rx_buffer() that checks whether a copy operation is less than MAX_BUFFER_OFFSET in length, since MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Reported-By: Sander Eikelenboom <linux@eikelenboom.it> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26xen-netback: Functional follow-up patch for grant mapping seriesZoltan Kiss1-7/+19
Ian made some late comments about the grant mapping series, I incorporated the functional outcomes into this patch: - use callback_param macro to shorten access to pending_tx_info in xenvif_fill_frags() and xenvif_tx_submit() - print an error message in xenvif_idx_unmap() before panic Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26xen-netback: Non-functional follow-up patch for grant mapping seriesZoltan Kiss1-3/+1
Ian made some late comments about the grant mapping series, I incorporated the non-functional outcomes into this patch: - typo fixes in a comment of xenvif_free(), and add another one there as well - typo fix for comment of rx_drain_timeout_msecs - remove stale comment before calling xenvif_grant_handle_reset() Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26xen-netback: Stop using xenvif_tx_pending_slots_availableZoltan Kiss1-11/+2
Since the early days TX stops if there isn't enough free pending slots to consume a maximum sized (slot-wise) packet. Probably the reason for that is to avoid the case when we don't have enough free pending slot in the ring to finish the packet. But if we make sure that the pending ring has the same size as the shared ring, that shouldn't really happen. The frontend can only post packets which fit the to the free space of the shared ring. If it doesn't, the frontend has to stop, as it can only increase the req_prod when the whole packet fits onto the ring. This patch avoid using this checking, makes sure the 2 ring has the same size, and remove a checking from the callback. As now we don't stop the NAPI instance on this condition, we don't have to wake it up if we free pending slots up. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25xen-netback: Proper printf format for ptrdiff_t is 't'.David S. Miller1-1/+1
This fixes: drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_dealloc_action’: drivers/net/xen-netback/netback.c:1573:8: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long int’ [-Wformat=] Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25Revert "xen-netback: Aggregate TX unmap operations"Zoltan Kiss1-33/+1
This reverts commit e9275f5e2df1b2098a8cc405d87b88b9affd73e6. This commit is the last in the netback grant mapping series, and it tries to do more aggressive aggreagtion of unmap operations. However practical use showed almost no positive effect, whilst with certain frontends it causes significant performance regression. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10Xen-netback: Fix issue caused by using gso_type wronglyAnnie Li1-21/+18
Current netback uses gso_type to check whether the skb contains gso offload, and this is wrong. Gso_size is the right one to check gso existence, and gso_type is only used to check gso type. Some skbs contains nonzero gso_type and zero gso_size, current netback would treat these skbs as gso and create wrong response for this. This also causes ssh failure to domu from other server. V2: use skb_is_gso function as Paul Durrant suggested Signed-off-by: Annie Li <annie.li@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Aggregate TX unmap operationsZoltan Kiss1-1/+33
Unmapping causes TLB flushing, therefore we should make it in the largest possible batches. However we shouldn't starve the guest for too long. So if the guest has space for at least two big packets and we don't have at least a quarter ring to unmap, delay it for at most 1 milisec. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Timeout packets in RX pathZoltan Kiss1-3/+20
A malicious or buggy guest can leave its queue filled indefinitely, in which case qdisc start to queue packets for that VIF. If those packets came from an another guest, it can block its slots and prevent shutdown. To avoid that, we make sure the queue is drained in every 10 seconds. The QDisc queue in worst case takes 3 round to flush usually. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Handle guests with too many fragsZoltan Kiss1-10/+154
Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that: - create a new skb - map the leftover slots to its frags (no linear buffer here!) - chain it to the previous through skb_shinfo(skb)->frag_list - map them - copy and coalesce the frags into a brand new one and send it to the stack - unmap the 2 old skb's pages It's also introduces new stat counters, which help determine how often the guest sends a packet with more than MAX_SKB_FRAGS frags. NOTE: if bisect brought you here, you should apply the series up until "xen-netback: Timeout packets in RX path", otherwise malicious guests can block other guests by not releasing their sent packets. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Add stat counters for zerocopyZoltan Kiss1-1/+8
These counters help determine how often the buffers had to be copied. Also they help find out if packets are leaked, as if "sent != success + fail", there are probably packets never freed up properly. NOTE: if bisect brought you here, you should apply the series up until "xen-netback: Timeout packets in RX path", otherwise Windows guests can't work properly and malicious guests can block other guests by not releasing their sent packets. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Remove old TX grant copy definitons and fix indentationsZoltan Kiss1-58/+14
These became obsolete with grant mapping. I've left intentionally the indentations in this way, to improve readability of previous patches. NOTE: if bisect brought you here, you should apply the series up until "xen-netback: Timeout packets in RX path", otherwise Windows guests can't work properly and malicious guests can block other guests by not releasing their sent packets. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Introduce TX grant mappingZoltan Kiss1-160/+272
This patch introduces grant mapping on netback TX path. It replaces grant copy operations, ditching grant copy coalescing along the way. Another solution for copy coalescing is introduced in "xen-netback: Handle guests with too many frags", older guests and Windows can broke before that patch applies. There is a callback (xenvif_zerocopy_callback) from core stack to release the slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a separate dealloc thread, as scheduling NAPI instance from there is inefficient, therefore we can't do dealloc from the instance. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Handle foreign mapped pages on the guest RX pathZoltan Kiss1-5/+43
RX path need to know if the SKB fragments are stored on pages from another domain. Logically this patch should be after introducing the grant mapping itself, as it makes sense only after that. But to keep bisectability, I moved it here. It shouldn't change any functionality here. xenvif_zerocopy_callback and ubuf_to_vif are just stubs here, they will be introduced properly later on. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Minor refactoring of netback codeZoltan Kiss1-19/+3
This patch contains a few bits of refactoring before introducing the grant mapping changes: - introducing xenvif_tx_pending_slots_available(), as this is used several times, and will be used more often - rename the thread to vifX.Y-guest-rx, to signify it does RX work from the guest point of view Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07xen-netback: Use skb->cb for pending_idxZoltan Kiss1-17/+25
Storing the pending_idx at the first byte of the linear buffer never looked good, skb->cb is a more proper place for this. It also prevents the header to be directly grant copied there, and we don't have the pending_idx after we copied the header here, so it's time to change it. It also introduces helpers for the RX side Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-05xen-netback: Fix Rx stall due to race conditionZoltan Kiss1-10/+6
The recent patch to fix receive side flow control (11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread spinning if frontend is unresponsive) solved the spinning thread problem, however caused an another one. The receive side can stall, if: - [THREAD] xenvif_rx_action sets rx_queue_stopped to true - [INTERRUPT] interrupt happens, and sets rx_event to true - [THREAD] then xenvif_kthread sets rx_event to false - [THREAD] rx_work_todo doesn't return true anymore Also, if interrupt sent but there is still no room in the ring, it take quite a long time until xenvif_rx_action realize it. This patch ditch that two variable, and rework rx_work_todo. If the thread finds it can't fit more skb's into the ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's kept as 0. Then rx_work_todo will check if: - there is something to send to the ring (like before) - there is space for the topmost packet in the queue I think that's more natural and optimal thing to test than two bool which are set somewhere else. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14xen-netback: use new skb_checksum_setup functionPaul Durrant1-257/+3
Use skb_checksum_setup to set up partial checksum offsets rather then a private implementation. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09xen-netback: stop vif thread spinning if frontend is unresponsivePaul Durrant1-5/+9
The recent patch to improve guest receive side flow control (ca2f09f2) had a slight flaw in the wait condition for the vif thread in that any remaining skbs in the guest receive side netback internal queue would prevent the thread from sleeping. An unresponsive frontend can lead to a permanently non-empty internal queue and thus the thread will spin. In this case the thread should really sleep until the frontend becomes responsive again. This patch adds an extra flag to the vif which is set if the shared ring is full and cleared when skbs are drained into the shared ring. Thus, if the thread runs, finds the shared ring full and can make no progress the flag remains set. If the flag remains set then the thread will sleep, regardless of a non-empty queue, until the next event from the frontend. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29xen-netback: fix guest-receive-side array sizesPaul Durrant1-1/+1
The sizes chosen for the metadata and grant_copy_op arrays on the guest receive size are wrong; - The meta array is needlessly twice the ring size, when we only ever consume a single array element per RX ring slot - The grant_copy_op array is way too small. It's sized based on a bogus assumption: that at most two copy ops will be used per ring slot. This may have been true at some point in the past but it's clear from looking at start_new_rx_buffer() that a new ring slot is only consumed if a frag would overflow the current slot (plus some other conditions) so the actual limit is MAX_SKB_FRAGS grant_copy_ops per ring slot. This patch fixes those two sizing issues and, because grant_copy_ops grows so much, it pulls it out into a separate chunk of vmalloc()ed memory. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19xen-netback: add gso_segs calculationPaul Durrant1-4/+15
netback already has code which parses IPv4 and v6 headers to set up checksum offsets and these are always applied to GSO packets being sent from frontends. It's therefore suboptimal that GSOs are being marked SKB_GSO_DODGY to defer the gso_segs calculation when netback already has all necessary information to hand to do the calculation. This patch adds that calculation. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19xen-netback: fix some error return codeWei Yongjun1-4/+12
'err' is overwrited to 0 after maybe_pull_tail() call, so the error code was not set if skb_partial_csum_set() call failed. Fix to return error -EPROTO from those error handling case instead of 0. Fixes: d52eb0d46f36 ('xen-netback: make sure skb linear area covers checksum field') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17xen-netback: fix fragments error handling in checksum_setup_ip()Wei Yongjun1-0/+3
Fix to return -EPROTO error if fragments detected in checksum_setup_ip(). Fixes: 1431fb31ecba ('xen-netback: fix fragment detection in checksum setup') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12xen-netback: fix gso_prefix checkPaul Durrant1-1/+1
There is a mistake in checking the gso_prefix mask when passing large packets to a guest. The wrong shift is applied to the bit - the raw skb gso type is used rather then the translated one. This leads to large packets being handed to the guest without the GSO metadata. This patch fixes the check. The mistake manifested as errors whilst running Microsoft HCK large packet offload tests between a pair of Windows 8 VMs. I have verified this patch fixes those errors. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12xen-netback: napi: don't prematurely request a tx eventPaul Durrant1-1/+1
This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the former call has the side effect of advancing the ring event pointer and therefore inviting another interrupt from the frontend before the napi poll has actually finished, thereby defeating the point of napi. The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in xenvif_poll, the napi poll function, if the work done is less than the budget i.e. when actually transitioning back to interrupt mode. Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12xen-netback: napi: fix abuse of budgetPaul Durrant1-7/+7
netback seems to be somewhat confused about the napi budget parameter. The parameter is supposed to limit the number of skbs processed in each poll, but netback has this confused with grant operations. This patch fixes that, properly limiting the work done in each poll. Note that this limit makes sure we do not process any more data from the shared ring than we intend to pass back from the poll. This is important to prevent tx_queue potentially growing without bound. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11xen-netback: make sure skb linear area covers checksum fieldPaul Durrant1-32/+28
skb_partial_csum_set requires that the linear area of the skb covers the checksum field. The checksum setup code in netback was only doing that pullup in the case when the pseudo header checksum was being recalculated though. This patch makes that pullup unconditional. (I pullup the whole transport header just for simplicity; the requirement is only for the check field but in the case of UDP this is the last field in the header and in the case of TCP it's the last but one). The lack of pullup manifested as failures running Microsoft HCK network tests on a pair of Windows 8 VMs and it has been verified that this patch fixes the problem. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>