aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/af_xdp.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/af_xdp.rst')
-rw-r--r--Documentation/networking/af_xdp.rst106
1 files changed, 77 insertions, 29 deletions
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 5bc55a4e3bce..60b217b436be 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -243,8 +243,8 @@ Configuration Flags and Socket Options
These are the various configuration flags that can be used to control
and monitor the behavior of AF_XDP sockets.
-XDP_COPY and XDP_ZERO_COPY bind flags
--------------------------------------
+XDP_COPY and XDP_ZEROCOPY bind flags
+------------------------------------
When you bind to a socket, the kernel will first try to use zero-copy
copy. If zero-copy is not supported, it will fall back on using copy
@@ -252,20 +252,27 @@ mode, i.e. copying all packets out to user space. But if you would
like to force a certain mode, you can use the following flags. If you
pass the XDP_COPY flag to the bind call, the kernel will force the
socket into copy mode. If it cannot use copy mode, the bind call will
-fail with an error. Conversely, the XDP_ZERO_COPY flag will force the
+fail with an error. Conversely, the XDP_ZEROCOPY flag will force the
socket into zero-copy mode or fail.
XDP_SHARED_UMEM bind flag
-------------------------
-This flag enables you to bind multiple sockets to the same UMEM, but
-only if they share the same queue id. In this mode, each socket has
-their own RX and TX rings, but the UMEM (tied to the fist socket
-created) only has a single FILL ring and a single COMPLETION
-ring. To use this mode, create the first socket and bind it in the normal
-way. Create a second socket and create an RX and a TX ring, or at
-least one of them, but no FILL or COMPLETION rings as the ones from
-the first socket will be used. In the bind call, set he
+This flag enables you to bind multiple sockets to the same UMEM. It
+works on the same queue id, between queue ids and between
+netdevs/devices. In this mode, each socket has their own RX and TX
+rings as usual, but you are going to have one or more FILL and
+COMPLETION ring pairs. You have to create one of these pairs per
+unique netdev and queue id tuple that you bind to.
+
+Starting with the case were we would like to share a UMEM between
+sockets bound to the same netdev and queue id. The UMEM (tied to the
+fist socket created) will only have a single FILL ring and a single
+COMPLETION ring as there is only on unique netdev,queue_id tuple that
+we have bound to. To use this mode, create the first socket and bind
+it in the normal way. Create a second socket and create an RX and a TX
+ring, or at least one of them, but no FILL or COMPLETION rings as the
+ones from the first socket will be used. In the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
sockets this way.
@@ -283,19 +290,19 @@ round-robin example of distributing packets is shown below:
#define MAX_SOCKS 16
struct {
- __uint(type, BPF_MAP_TYPE_XSKMAP);
- __uint(max_entries, MAX_SOCKS);
- __uint(key_size, sizeof(int));
- __uint(value_size, sizeof(int));
+ __uint(type, BPF_MAP_TYPE_XSKMAP);
+ __uint(max_entries, MAX_SOCKS);
+ __uint(key_size, sizeof(int));
+ __uint(value_size, sizeof(int));
} xsks_map SEC(".maps");
static unsigned int rr;
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
{
- rr = (rr + 1) & (MAX_SOCKS - 1);
+ rr = (rr + 1) & (MAX_SOCKS - 1);
- return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
+ return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
}
Note, that since there is only a single set of FILL and COMPLETION
@@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the
libbpf code that protects multiple users at this point in time.
Libbpf uses this mode if you create more than one socket tied to the
-same umem. However, note that you need to supply the
+same UMEM. However, note that you need to supply the
XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
xsk_socket__create calls and load your own XDP program as there is no
built in one in libbpf that will route the traffic for you.
+The second case is when you share a UMEM between sockets that are
+bound to different queue ids and/or netdevs. In this case you have to
+create one FILL ring and one COMPLETION ring for each unique
+netdev,queue_id pair. Let us say you want to create two sockets bound
+to two different queue ids on the same netdev. Create the first socket
+and bind it in the normal way. Create a second socket and create an RX
+and a TX ring, or at least one of them, and then one FILL and
+COMPLETION ring for this socket. Then in the bind call, set he
+XDP_SHARED_UMEM option and provide the initial socket's fd in the
+sxdp_shared_umem_fd field as you registered the UMEM on that
+socket. These two sockets will now share one and the same UMEM.
+
+There is no need to supply an XDP program like the one in the previous
+case where sockets were bound to the same queue id and
+device. Instead, use the NIC's packet steering capabilities to steer
+the packets to the right queue. In the previous example, there is only
+one queue shared among sockets, so the NIC cannot do this steering. It
+can only steer between queues.
+
+In libbpf, you need to use the xsk_socket__create_shared() API as it
+takes a reference to a FILL ring and a COMPLETION ring that will be
+created for you and bound to the shared UMEM. You can use this
+function for all the sockets you create, or you can use it for the
+second and following ones and use xsk_socket__create() for the first
+one. Both methods yield the same result.
+
+Note that a UMEM can be shared between sockets on the same queue id
+and device, as well as between queues on the same device and between
+devices at the same time.
+
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@@ -342,7 +379,7 @@ would look like this for the TX path:
.. code-block:: c
if (xsk_ring_prod__needs_wakeup(&my_tx_ring))
- sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
+ sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
I.e., only use the syscall if the flag is set.
@@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the
COMPLETION ring are mandatory as you need to have a UMEM tied to your
socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
first one does not have a UMEM and should in that case not have any
-FILL or COMPLETION rings created as the ones from the shared umem will
+FILL or COMPLETION rings created as the ones from the shared UMEM will
be used. Note, that the rings are single-producer single-consumer, so
do not try to access them from multiple processes at the same
time. See the XDP_SHARED_UMEM section.
@@ -405,9 +442,9 @@ purposes. The supported statistics are shown below:
.. code-block:: c
struct xdp_statistics {
- __u64 rx_dropped; /* Dropped for reasons other than invalid desc */
- __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
- __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+ __u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
XDP_OPTIONS getsockopt
@@ -446,15 +483,15 @@ like this:
.. code-block:: c
// struct xdp_rxtx_ring {
- // __u32 *producer;
- // __u32 *consumer;
- // struct xdp_desc *desc;
+ // __u32 *producer;
+ // __u32 *consumer;
+ // struct xdp_desc *desc;
// };
// struct xdp_umem_ring {
- // __u32 *producer;
- // __u32 *consumer;
- // __u64 *desc;
+ // __u32 *producer;
+ // __u32 *consumer;
+ // __u64 *desc;
// };
// typedef struct xdp_rxtx_ring RING;
@@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
+Q: My packets are sometimes corrupted. What is wrong?
+
+A: Care has to be taken not to feed the same buffer in the UMEM into
+ more than one ring at the same time. If you for example feed the
+ same buffer into the FILL ring and the TX ring at the same time, the
+ NIC might receive data into the buffer at the same time it is
+ sending it. This will cause some packets to become corrupted. Same
+ thing goes for feeding the same buffer into the FILL rings
+ belonging to different queue ids or netdevs bound with the
+ XDP_SHARED_UMEM flag.
+
Credits
=======