aboutsummaryrefslogtreecommitdiffstats
path: root/samples/bpf/xdp_rxq_info_kern.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-28samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrsJesper Dangaard Brouer1-1/+25
XDP_TX requires also changing the MAC-addrs, else some hardware may drop the TX packet before reaching the wire. This was observed with driver mlx5. If xdp_rxq_info select --action XDP_TX the swapmac functionality is activated. It is also possible to manually enable via cmdline option --swapmac. This is practical if wanting to measure the overhead of writing/updating payload for other action types. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-28samples/bpf: extend xdp_rxq_info to read packet payloadJesper Dangaard Brouer1-0/+19
There is a cost associated with reading the packet data payload that this test ignored. Add option --read to allow enabling reading part of the payload. This sample/tool helps us analyse an issue observed with a NIC mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4. With no_touch of data: Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 0 14,465,157 0 XDP-RX CPU 1 14,464,728 0 XDP-RX CPU 2 14,465,283 0 XDP-RX CPU 3 14,465,282 0 XDP-RX CPU 4 14,464,159 0 XDP-RX CPU 5 14,465,379 0 XDP-RX CPU total 86,789,992 When not touching data, we observe that the CPUs have idle cycles. When reading data the CPUs are 100% busy in softirq. With reading data: Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read XDP stats CPU pps issue-pps XDP-RX CPU 0 9,620,639 0 XDP-RX CPU 1 9,489,843 0 XDP-RX CPU 2 9,407,854 0 XDP-RX CPU 3 9,422,289 0 XDP-RX CPU 4 9,321,959 0 XDP-RX CPU 5 9,395,242 0 XDP-RX CPU total 56,657,828 The effect seen above is a result of cache-misses occuring when more RXQs are being used. Based on perf-event observations, our conclusion is that the CPUs DDIO (Direct Data I/O) choose to deliver packet into main memory, instead of L3-cache. We also found, that this can be mitigated by either using less RXQs or by reducing NICs the RX-ring size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-05samples/bpf: program demonstrating access to xdp_rxq_infoJesper Dangaard Brouer1-0/+96
This sample program can be used for monitoring and reporting how many packets per sec (pps) are received per NIC RX queue index and which CPU processed the packet. In itself it is a useful tool for quickly identifying RSS imbalance issues, see below. The default XDP action is XDP_PASS in-order to provide a monitor mode. For benchmarking purposes it is possible to specify other XDP actions on the cmdline --action. Output below shows an imbalance RSS case where most RXQ's deliver to CPU-0 while CPU-2 only get packets from a single RXQ. Looking at things from a CPU level the two CPUs are processing approx the same amount, BUT looking at the rx_queue_index levels it is clear that RXQ-2 receive much better service, than other RXQs which all share CPU-0. Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS XDP stats CPU pps issue-pps XDP-RX CPU 0 900,473 0 XDP-RX CPU 2 906,921 0 XDP-RX CPU total 1,807,395 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 180,098 0 rx_queue_index 0:sum 180,098 rx_queue_index 1:0 180,098 0 rx_queue_index 1:sum 180,098 rx_queue_index 2:2 906,921 0 rx_queue_index 2:sum 906,921 rx_queue_index 3:0 180,098 0 rx_queue_index 3:sum 180,098 rx_queue_index 4:0 180,082 0 rx_queue_index 4:sum 180,082 rx_queue_index 5:0 180,093 0 rx_queue_index 5:sum 180,093 Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>