aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-01-05Merge branch 'xdp_rxq_info'Alexei Starovoitov36-28/+990
Jesper Dangaard Brouer says: ==================== V4: * Added reviewers/acks to patches * Fix patch desc in i40e that got out-of-sync with code * Add SPDX license headers for the two new files added in patch 14 V3: * Fixed bug in virtio_net driver * Removed export of xdp_rxq_info_init() V2: * Changed API exposed to drivers - Removed invocation of "init" in drivers, and only call "reg" (Suggested by Saeed) - Allow "reg" to fail and handle this in drivers (Suggested by David Ahern) * Removed the SINKQ qtype, instead allow to register as "unused" * Also fixed some drivers during testing on actual HW (noted in patches) There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending struct xdp_buff each time new info is needed, this patchset takes a different approach. Struct xdp_buff is only extended with a pointer to a struct xdp_rxq_info (allowing for easier extending this later). This xdp_rxq_info contains information related to how the driver have setup the individual RX-queue's. This is read-mostly information, and all xdp_buff frames (in drivers napi_poll) point to the same xdp_rxq_info (per RX-queue). We stress this data/cache-line is for read-mostly info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. This patchset start out small, and only expose ingress_ifindex and the RX-queue index to the XDP/BPF program. Access to tangible info like the ingress ifindex and RX queue index, is fairly easy to comprehent. The other future use-cases could allow XDP frames to be recycled back to the originating device driver, by providing info on RX device and queue number. As XDP doesn't have driver feature flags, and eBPF code due to bpf-tail-calls cannot determine that XDP driver invoke it, this patchset have to update every driver that support XDP. For driver developers (review individual driver patches!): The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring modification require (temporary) stopping RX frames, then the xdp_rxq_info should (likely) also be unregistred and re-registered, especially if reallocating the pages in the ring. Make sure ethtool set_channels does the right thing. When replacing XDP prog, if and only if RX-ring need to be changed, then also re-register the xdp_rxq_info. I'm Cc'ing the individual driver patches to the registered maintainers. Testing: I've only tested the NIC drivers I have hardware for. The general test procedure is to (DUT = Device Under Test): (1) run pktgen script pktgen_sample04_many_flows.sh (against DUT) (2) run samples/bpf program xdp_rxq_info --dev $DEV (on DUT) (3) runtime modify number of NIC queues via ethtool -L (on DUT) (4) runtime modify number of NIC ring-size via ethtool -G (on DUT) Patch based on git tree bpf-next (at commit fb982666e380c1632a): https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05samples/bpf: program demonstrating access to xdp_rxq_infoJesper Dangaard Brouer3-0/+631
This sample program can be used for monitoring and reporting how many packets per sec (pps) are received per NIC RX queue index and which CPU processed the packet. In itself it is a useful tool for quickly identifying RSS imbalance issues, see below. The default XDP action is XDP_PASS in-order to provide a monitor mode. For benchmarking purposes it is possible to specify other XDP actions on the cmdline --action. Output below shows an imbalance RSS case where most RXQ's deliver to CPU-0 while CPU-2 only get packets from a single RXQ. Looking at things from a CPU level the two CPUs are processing approx the same amount, BUT looking at the rx_queue_index levels it is clear that RXQ-2 receive much better service, than other RXQs which all share CPU-0. Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS XDP stats CPU pps issue-pps XDP-RX CPU 0 900,473 0 XDP-RX CPU 2 906,921 0 XDP-RX CPU total 1,807,395 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 180,098 0 rx_queue_index 0:sum 180,098 rx_queue_index 1:0 180,098 0 rx_queue_index 1:sum 180,098 rx_queue_index 2:2 906,921 0 rx_queue_index 2:sum 906,921 rx_queue_index 3:0 180,098 0 rx_queue_index 3:sum 180,098 rx_queue_index 4:0 180,082 0 rx_queue_index 4:sum 180,082 rx_queue_index 5:0 180,093 0 rx_queue_index 5:sum 180,093 Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05bpf: finally expose xdp_rxq_info to XDP bpf-programsJesper Dangaard Brouer2-0/+22
Now all XDP driver have been updated to setup xdp_rxq_info and assign this to xdp_buff->rxq. Thus, it is now safe to enable access to some of the xdp_rxq_info struct members. This patch extend xdp_md and expose UAPI to userspace for ingress_ifindex and rx_queue_index. Access happens via bpf instruction rewrite, that load data directly from struct xdp_rxq_info. * ingress_ifindex map to xdp_rxq_info->dev->ifindex * rx_queue_index map to xdp_rxq_info->queue_index Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp: generic XDP handling of xdp_rxq_infoJesper Dangaard Brouer2-10/+61
Hook points for xdp_rxq_info: * reg : netif_alloc_rx_queues * unreg: netif_free_rx_queues The net_device have some members (num_rx_queues + real_num_rx_queues) and data-area (dev->_rx with struct netdev_rx_queue's) that were primarily used for exporting information about RPS (CONFIG_RPS) queues to sysfs (CONFIG_SYSFS). For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info, and remove some of the CONFIG_SYSFS ifdefs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05virtio_net: setup xdp_rxq_infoJesper Dangaard Brouer1-1/+13
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05tun: setup xdp_rxq_infoJesper Dangaard Brouer1-1/+23
Driver hook points for xdp_rxq_info: * reg : tun_attach * unreg: __tun_detach I've done some manual testing of this tun driver, but I would appriciate good review and someone else running their use-case tests, as I'm not 100% sure I understand the tfile->detached semantics. V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang. Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05thunderx: setup xdp_rxq_infoJesper Dangaard Brouer3-4/+13
This driver uses a bool scheme for "enable"/"disable" when setting up different resources. Thus, the hook points for xdp_rxq_info is done in the same function call nicvf_rcv_queue_config(). This is activated through enable/disable via nicvf_config_data_transfer(), which is tied into nicvf_stop()/nicvf_open(). Extending driver packet handler call-path nicvf_rcv_pkt_handler() with a pointer to the given struct rcv_queue, in-order to access the xdp_rxq_info data area (in nicvf_xdp_rx()). V2: Driver have no proper error path for failed XDP RX-queue info reg, as nicvf_rcv_queue_config is a void function. Cc: linux-arm-kernel@lists.infradead.org Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Robert Richter <rric@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05nfp: setup xdp_rxq_infoJesper Dangaard Brouer2-3/+12
Driver hook points for xdp_rxq_info: * reg : nfp_net_rx_ring_alloc * unreg: nfp_net_rx_ring_free In struct nfp_net_rx_ring moved member @size into a hole on 64-bit. Thus, the size remaines the same after adding member @xdp_rxq. Cc: oss-drivers@netronome.com Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05bnxt_en: setup xdp_rxq_infoJesper Dangaard Brouer3-0/+13
Driver hook points for xdp_rxq_info: * reg : bnxt_alloc_rx_rings * unreg: bnxt_free_rx_rings This driver should be updated to re-register when changing allocation mode of RX rings. Tested on actual hardware. Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05mlx4: setup xdp_rxq_infoJesper Dangaard Brouer3-5/+15
Driver hook points for xdp_rxq_info: * reg : mlx4_en_create_rx_ring * unreg: mlx4_en_destroy_rx_ring Tested on actual hardware. Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_regJesper Dangaard Brouer5-0/+20
The driver code qede_free_fp_array() depend on kfree() can be called with a NULL pointer. This stems from the qede_alloc_fp_array() function which either (kz)alloc memory for fp->txq or fp->rxq. This also simplifies error handling code in case of memory allocation failures, but xdp_rxq_info_unreg need to know the difference. Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails and detect this is the failure path by seeing that xdp_rxq_info was not registred yet, which first happens after successful alloaction in qede_init_fp(). Driver hook points for xdp_rxq_info: * reg : qede_init_fp * unreg: qede_free_fp_array Tested on actual hardware with samples/bpf program. V2: Driver have no proper error path for failed XDP RX-queue info reg, as qede_init_fp() is a void function. Cc: everest-linux-l2@cavium.com Cc: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05ixgbe: setup xdp_rxq_infoJesper Dangaard Brouer3-1/+15
Driver hook points for xdp_rxq_info: * reg : ixgbe_setup_rx_resources() * unreg: ixgbe_free_rx_resources() Tested on actual hardware. V2: Fix ixgbe_set_ringparam, clear xdp_rxq_info in temp_ring Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05i40e: setup xdp_rxq_infoJesper Dangaard Brouer3-2/+21
The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is a sideband channel for configuring/updating the flow director tables. This (i40e_vsi_)type does not invoke XDP-ebpf code. As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring a special case, reverse the logic and only select RX-rings of type I40E_VSI_MAIN to register xdp_rxq_info's for. Driver hook points for xdp_rxq_info: * reg : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources) * unreg: i40e_free_rx_resources (via i40e_vsi_free_rx_resources) Tested on actual hardware with samples/bpf program. V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN. V4: Update patch desc that got out-of-sync with code. Cc: intel-wired-lan@lists.osuosl.org Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp/mlx5: setup xdp_rxq_infoJesper Dangaard Brouer3-0/+14
The mlx5 driver have a special drop-RQ queue (one per interface) that simply drops all incoming traffic. It helps driver keep other HW objects (flow steering) alive upon down/up operations. It is temporarily pointed by flow steering objects during the interface setup, and when interface is down. It lacks many fields that are set in a regular RQ (for example its state is never switched to MLX5_RQC_STATE_RDY). (Thanks to Tariq Toukan for explanation). The XDP RX-queue info for this drop-RQ marked as unused, which allow us to use the same takedown/free code path as other RX-queues. Driver hook points for xdp_rxq_info: * reg : mlx5e_alloc_rq() * unused: mlx5e_alloc_drop_rq() * unreg : mlx5e_free_rq() Tested on actual hardware with samples/bpf program Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp: base API for new XDP rx-queue info conceptJesper Dangaard Brouer4-1/+117
This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05nfp: add basic multicast filteringJakub Kicinski1-2/+5
We currently always pass all multicast traffic through. Only set L2MC when actually needed. Since the driver was not making use of the capability to filter out mcast frames, some FW projects don't implement it any more. Don't warn users if capability is not present (like we do for promisc flag). The lack of L2MC capability is assumed to mean all multicast traffic goes through. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05Merge branch 'rds-use-RCU-between-work-enqueue-and-connection-teardown'David S. Miller8-24/+86
Sowmini Varadhan says: ==================== rds: use RCU between work-enqueue and connection teardown This patchset follows up on the root-cause mentioned in https://www.spinics.net/lists/netdev/msg472849.html Patch1 implements some code refactoring that was suggeseted as an enhancement in http://patchwork.ozlabs.org/patch/843157/ It replaces the c_destroy_in_prog bit in rds_connection with an atomically managed flag in rds_conn_path. Patch2 builds on Patch1 and uses RCU to make sure that work is only enqueued if the connection destroy is not already in progress: the test-flag-and-enqueue is done under rcu_read_lock, while destroy first sets the flag, uses synchronize_rcu to wait for existing reader threads to complete, and then starts all the work-cancellation. Since I have not been able to reproduce the original stack traces reported by syszbot, and these are fixes for a race condition that are based on code-inspection I am not marking these as reported-by at this time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05rds: use RCU to synchronize work-enqueue with connection teardownSowmini Varadhan6-20/+81
rds_sendmsg() can enqueue work on cp_send_w from process context, but it should not enqueue this work if connection teardown has commenced (else we risk enquing work after rds_conn_path_destroy() has assumed that all work has been cancelled/flushed). Similarly some other functions like rds_cong_queue_updates and rds_tcp_data_ready are called in softirq context, and may end up enqueuing work on rds_wq after rds_conn_path_destroy() has assumed that all workqs are quiesced. Check the RDS_DESTROY_PENDING bit and use rcu synchronization to avoid all these races. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05rds: Use atomic flag to track connections being destroyedSowmini Varadhan3-6/+7
Replace c_destroy_in_prog by using a bit in cp_flags that can set/tested atomically. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05Merge branch 'tipc-two-small-cleanups'David S. Miller1-29/+24
Jon Maloy says: ==================== tipc: two small cleanups These two commits are based on commit f9c935db8086 ("tipc: fix problems with multipoint-to-point flow control") which has been applied to 'net' but not yet to 'net-next'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05tipc: simplify small window members' sorting algorithmJon Maloy1-9/+4
We simplify the sorting algorithm in tipc_update_member(). We also make the remaining conditional call to this function unconditional, since the same condition now is tested for inside the said function. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05tipc: some clarifying name changesJon Maloy1-23/+23
We rename some functions and variables, to make their purpose clearer. - tipc_group::congested -> tipc_group::small_win. Members in this list are not necessarily (and typically) congested. Instead, they may *potentially* be subject to congestion because their send window is less than ADV_IDLE, and therefore need to be checked during message transmission. - tipc_group_is_receiver() -> tipc_group_is_sender(). This socket will accept messages coming from members fulfilling this condition, i.e., they are senders from this member's viewpoint. - tipc_group_is_enabled() -> tipc_group_is_receiver(). Members fulfilling this condition will accept messages sent from the current socket, i.e., they are receivers from its viewpoint. There are no functional changes in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: dsa: lan9303: Fix error return code in lan9303_check_device()Wei Yongjun1-1/+1
Fix to return error code -ENODEV from the chip not found error handling case instead of 0(ret have been overwritten to 0 by lan9303_read()), as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05Merge branch 'dsa-Move-padding-into-Broadcom-tagger'David S. Miller3-27/+12
Florian Fainelli says: ==================== net: dsa: Move padding into Broadcom tagger This patch series moves the padding of short packets to where it belongs within the DSA Broadcom tagger code, I just found myself doing this for a third driver, which was a clear indication this was wrong and did not scale. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: bgmac: Remove short packet padding for DSAFlorian Fainelli1-15/+0
DSA now correctly pads short packets within net/dsa/tag_brcm.c such that this it is no longer necessary to do this within bgmac. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: systemport: Remove short packet paddingFlorian Fainelli1-12/+0
Short packet padding added to the driver is only necessary when using Broadcom tags, but since this is now taken care of net/dsa/tag_brcm.c, we are guaranteed being given correctly padded packets. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: dsa: Move padding into Broadcom taggerFlorian Fainelli1-0/+12
Instead of having the different master network device drivers potentially used by DSA/Broadcom tags, move the padding necessary for the switches to accept short packets where it makes most sense: within tag_brcm.c. This avoids multiplying the number of similar commits to e.g: bgmac, bcmsysport, etc. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: sched: fix tcf_block_get_ext() in case CONFIG_NET_CLS is not setQuentin Monnet1-1/+2
The definition of functions tcf_block_get() and tcf_block_get_ext() depends of CONFIG_NET_CLS being set. When those functions gained extack support, only one version of the declaration of those functions was updated. Function tcf_block_get() was later fixed with commit 3c1490913f3b ("net: sch: api: fix tcf_block_get"). Change arguments of tcf_block_get_ext() for the case when CONFIG_NET_CLS is not set. Fixes: 8d1a77f974ca ("net: sch: api: add extack support in tcf_block_get") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net: revert "Update RFS target at poll for tcp/udp"Soheil Hassas Yeganeh2-4/+0
On multi-threaded processes, one common architecture is to have one (or a small number of) threads polling sockets, and a considerably larger pool of threads reading form and writing to the sockets. When we set RPS core on tcp_poll() or udp_poll() we essentially steer all packets of all the polled FDs to one (or small number of) cores, creaing a bottleneck and/or RPS misprediction. Another common architecture is to shard FDs among threads pinned to cores. In such a setting, setting RPS core in tcp_poll() and udp_poll() is redundant because the RFS core is correctly set in recvmsg and sendmsg. Thus, revert the following commit: c3f1dbaf6e28 ("net: Update RFS target at poll for tcp/udp"). Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05ip: do not set RFS core on error queue readsSoheil Hassas Yeganeh1-1/+2
We should only record RPS on normal reads and writes. In single threaded processes, all calls record the same state. In multi-threaded processes where a separate thread processes errors, the RFS table mispredicts. Note that, when CONFIG_RPS is disabled, sock_rps_record_flow is a noop and no branch is added as a result of this patch. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05Merge branch 'l2tp-remove-configurable-offset-parameters'David S. Miller5-51/+7
James Chapman says: ==================== l2tp: remove configurable offset parameters This patch series removes all code to support a configurable offset in transmitted l2tp packets. Code to handle this is incomplete and buggy and has been this way for years. If anyone tried to configure an offset, it would be ignored for L2TPv2 tunnels, or for L2TPv3 tunnels, could result in L2TPv3 packets being transmitted which are not compliant with L2TPv3 RFC3931. This patch series removes the support for configurable offsets. No known userspace l2tp daemon configures an offset. However, iproute2's "ip l2tp" command has an offset parameter and if set, the value is passed to the kernel. This is the most likely use case where offsets might be configured, e.g. ip l2tp add tunnel local 1.1.1.1 remote 1.1.1.2 tunnel_id 1 \ peer_tunnel_id 2 encap ip ip l2tp add session name l2tp0 tunnel_id 1 session_id 1 \ peer_session_id 2 offset 8 The above would result in packets being transmitted to 1.1.1.2 with 8 bytes padding between the L2TPv3 header and the payload. The peer would need to be configured with the same offset value. However, the packets are not compliant with the L2TPv3 RFC, hence I think it's unlikely that offset is being used. With this patch series applied, the offset would not be configured. The peer would need to be modified to remove its offset setting too. iproute2 should be modified to remove or ignore the ip l2tp offset parameter. This issue was discovered when reviewing a patch series from lorenzo.bianconi@redhat.com which adds another netlink attribute to configure the expected offset in received L2TPv3 packets. This change is reverted by this series because offsets do not exist in L2TPv3 packets. These commits are: commit f15bc54eeecd ("l2tp: add peer_offset parameter") commit 820da5357572 ("l2tp: fix missing print session offset info") In more detail: The L2TPv2 protocol supports a variable offset from the L2TPv2 header to the payload to give the sender implementation some flexibility for data alignment when adding L2TP headers on to payloads. The offset value is indicated by an optional field in the L2TP header. Our L2TP implementation already detects the presence of the optional offset in received packets and skips those bytes when parsing packets. All transmitted L2TPv2 packets are always transmitted with no offset. L2TPv3 has no optional offset field in the L2TPv3 packet header. Instead, L2TPv3 defines optional fields in a "Layer-2 Specific Sublayer". At the time when the original L2TP code was written, there was talk at IETF of offset being implemented in a new Layer-2 Specific Sublayer. A L2TP_ATTR_OFFSET netlink attribute was added so that this offset could be configured and the intention was to allow it to be also used to set the tx offset for L2TPv2. However, no L2TPv3 offset was ever specified and the L2TP_ATTR_OFFSET parameter was forgotten about. Setting L2TP_ATTR_OFFSET results in L2TPv3 packets being transmitted with the specified number of bytes padding between L2TPv3 header and payload. This is not compliant with L2TPv3 RFC3931. So this change removes the configurable offset altogether while retaining L2TP_ATTR_OFFSET in the API for backwards compatibility. If L2TP_ATTR_OFFSET is given, its value is now silently ignored. ==================== Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05l2tp: add comment in API header that L2TP_ATTR_OFFSET is not usedJames Chapman1-1/+1
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05l2tp: remove configurable payload offsetJames Chapman4-18/+6
If L2TP_ATTR_OFFSET is set to a non-zero value in L2TPv3 tunnels, it results in L2TPv3 packets being transmitted which might not be compliant with the L2TPv3 RFC. This patch has l2tp ignore the offset setting and send all packets with no offset. In more detail: L2TPv2 supports a variable offset from the L2TPv2 header to the payload. The offset value is indicated by an optional field in the L2TP header. Our L2TP implementation already detects the presence of the optional offset and skips that many bytes when handling data received packets. All transmitted packets are always transmitted with no offset. L2TPv3 has no optional offset field in the L2TPv3 packet header. Instead, L2TPv3 defines optional fields in a "Layer-2 Specific Sublayer". At the time when the original L2TP code was written, there was talk at IETF of offset being implemented in a new Layer-2 Specific Sublayer. A L2TP_ATTR_OFFSET netlink attribute was added so that this offset could be configured and the intention was to allow it to be also used to set the tx offset for L2TPv2. However, no L2TPv3 offset was ever specified and the L2TP_ATTR_OFFSET parameter was forgotten about. Setting L2TP_ATTR_OFFSET results in L2TPv3 packets being transmitted with the specified number of bytes padding between L2TPv3 header and payload. This is not compliant with L2TPv3 RFC3931. This change removes the configurable offset altogether while retaining L2TP_ATTR_OFFSET for backwards compatibility. Any L2TP_ATTR_OFFSET value is ignored. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05l2tp: revert "l2tp: fix missing print session offset info"James Chapman1-2/+0
Revert commit 820da5357572 ("l2tp: fix missing print session offset info"). The peer_offset parameter is removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05l2tp: revert "l2tp: add peer_offset parameter"James Chapman5-38/+8
Revert commit f15bc54eeecd ("l2tp: add peer_offset parameter"). This is removed because it is adding another configurable offset and configurable offsets are being removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05net/mlx5e: hide an unused variableArnd Bergmann1-1/+1
The uplink_rpriv variable was added at the start of the function but only used inside of an #ifdef: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_route_lookup_ipv6': drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:1549:25: error: unused variable 'uplink_rpriv' [-Werror=unused-variable] This moves the declaration into that #ifdef as well. Fixes: 5ed99fb421d4 ("net/mlx5e: Move ethernet representors data into separate struct") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04Merge tag 'mac80211-next-for-davem-2018-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-nextDavid S. Miller30-126/+219
Johannes Berg says: ==================== We have things all over the place, no point listing them. One thing is notable: I applied two patches and later reverted them - we'll get back to that once all the driver situation is sorted out. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04tg3: Add Macronix NVRAM supportPrashant Sreedharan2-4/+31
This patch adds the support for Macronix NVRAM Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com> Signed-off-by: Satish Baddipadige <satish.baddipadige@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04Merge branch 'dsa-lan9303-phy_addr_sel_strap-rename-and-retype'David S. Miller2-13/+13
Egil Hjelmeland says: ==================== net: dsa: lan9303: phy_addr_sel_strap rename and retype Non functional cleanups involving chip->phy_addr_sel_strap. As promised in https://lkml.org/lkml/2017/11/6/273 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04net: dsa: lan9303: Adjust phy_addr_base expressionsEgil Hjelmeland1-5/+5
Simplify calculation of chip->phy_addr_base in lan9303_detect_phy_setup(). Use GENMASK to calculate phys_mii_mask from LAN9303_NUM_PORTS and phy_addr_base. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04net: dsa: lan9303: phy_addr_sel_strap rename and retypeEgil Hjelmeland2-11/+11
chip->phy_addr_sel_strap is declared as a bool, but is also used as an integer address base. Rename 'phy_addr_sel_strap' to 'phy_addr_base', and change type to int. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04bpf: only build sockmap with CONFIG_INETJohn Fastabend3-2/+4
The sockmap infrastructure is only aware of TCP sockets at the moment. In the future we plan to add UDP. In both cases CONFIG_NET should be built-in. So lets only build sockmap if CONFIG_INET is enabled. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-04bpf: sockmap remove unused functionJohn Fastabend1-8/+0
This was added for some work that was eventually factored out but the helper call was missed. Remove it now and add it back later if needed. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-04mac80211: Fix setting TX power on monitor interfacesPeter Große2-2/+29
Instead of calling ieee80211_recalc_txpower on monitor interfaces directly, call it using the virtual monitor interface, if one exists. In case of a single monitor interface given, reject setting TX power, if no virtual monitor interface exists. That being checked, don't warn in ieee80211_bss_info_change_notify, after setting TX power on a monitor interface. Fixes warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2193 at net/mac80211/driver-ops.h:167 ieee80211_bss_info_change_notify+0x111/0x190 Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core rndis_host cdc_ether usbnet mii tp_smapi(O) thinkpad_ec(O) ohci_hcd vboxpci(O) vboxnetadp(O) vboxnetflt(O) v boxdrv(O) x86_pkg_temp_thermal kvm_intel kvm irqbypass iwldvm iwlwifi ehci_pci ehci_hcd tpm_tis tpm_tis_core tpm CPU: 0 PID: 2193 Comm: iw Tainted: G O 4.12.12-gentoo #2 task: ffff880186fd5cc0 task.stack: ffffc90001b54000 RIP: 0010:ieee80211_bss_info_change_notify+0x111/0x190 RSP: 0018:ffffc90001b57a10 EFLAGS: 00010246 RAX: 0000000000000006 RBX: ffff8801052ce840 RCX: 0000000000000064 RDX: 00000000fffffffc RSI: 0000000000040000 RDI: ffff8801052ce840 RBP: ffffc90001b57a38 R08: 0000000000000062 R09: 0000000000000000 R10: ffff8802144b5000 R11: ffff880049dc4614 R12: 0000000000040000 R13: 0000000000000064 R14: ffff8802105f0760 R15: ffffc90001b57b48 FS: 00007f92644b4580(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9263c109f0 CR3: 00000001df850000 CR4: 00000000000406f0 Call Trace: ieee80211_recalc_txpower+0x33/0x40 ieee80211_set_tx_power+0x40/0x180 nl80211_set_wiphy+0x32e/0x950 Reported-by: Peter Große <pegro@friiks.de> Signed-off-by: Peter Große <pegro@friiks.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-04Merge branch 'bpf-bpftool-misc-fixes'Daniel Borkmann8-37/+40
Jakub Kicinski says: ==================== This series addresses small issues that snuck through the review of cgroup code. "list" and "show" are now made aliases to satisfy all users. Small fix to errors printed is needed, errors can't contain new line characters, otherwise JSON will break. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-04tools: bpftool: remove new lines from errorsJakub Kicinski2-11/+11
It's a little bit unusual for kernel style, but we add the new line character to error strings inside the p_err() function. We do this because new lines at the end of error strings will break JSON output. Fix a few p_err("..\n") which snuck in recently. Fixes: 5ccda64d38cc ("bpftool: implement cgroup bpf operations") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-04tools: bpftool: alias show and list commandsJakub Kicinski8-19/+22
iproute2 seems to accept show and list as aliases. Let's do the same thing, and by allowing both bring cgroup syntax back in line with maps and progs. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-04tools: bpftool: rename cgroup list -> show in the codeJakub Kicinski1-9/+9
So far we have used "show" as a keyword for listing programs and maps. Use the word "show" in the code for cgroups too, next commit will alias show and list. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-03Merge branch 'nfp-flower-repr-link-state'David S. Miller6-8/+189
Jakub Kicinski says: ==================== nfp: flower: repr link state Dirk says: This series provides two updates towards the link state of reprs in the flower nfp app. Patch #1 improves the way link state is reported for reprs. Instead of starting with an assumed 'UP' state, always assume the link state is 'DOWN' and then modify this only on events received from firmware. Patch #2 adds a new nfp_app hook, repr_preclean. This callback is executed before reprs are removed from the app context and is executed per repr. Patch #3 implements the new REIFY control message, used to indicate when reprs are created and destroyed. Firmware uses these messages to prevent communication about any particular port when the driver doesn't know about the repr yet or anymore. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-03nfp: flower: implement the PORT_REIFY messageDirk van der Merwe4-3/+162
The PORT_REIFY message indicates whether reprs have been created or when they are about to be destroyed. This is necessary so firmware can know which state the driver is in, e.g. the firmware will not send any control messages related to ports when the reprs are destroyed. This prevents nuisance warning messages printed whenever the firmware sends updates for non-existent reprs. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>