Age | Commit message (Collapse) | Author | Files | Lines |
|
We currently notify a VF of the link state after ENABLE_QUEUES, which is
the last thing a VF does after being configured. Guests may not actually
ENABLE_QUEUES until they get configured, and thus between driver load
and device configuration the VF may show inaccurate link status.
Fix this by also sending the link state after GET_VF_RESOURCES. Although
we could remove the message following ENABLE_QUEUES, it's not that
significant of a loss, so this patch just keeps both to ensure maximum
compatibility with guests on various OSes.
Specifically, without this patch guests running FreeBSD will display
inaccurate link state until the device is brought up. This is mostly
a cosmetic issue but can be confusing to system administrators.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
If i40evf_open() is called quickly at the same time as a reset occurs
(such as via ethtool) it is possible for the device to attempt to open
while a reset is in progress. This occurs because the driver was not
holding the critical task bit lock during i40evf_open, nor was it
holding it around the call to i40evf_up_complete() in
i40evf_reset_task().
We didn't hold the lock previously because calls to i40evf_down() would
take the bit lock directly, and this would have caused a deadlock.
To avoid this, we'll move the bit lock handling out of i40evf_down() and
into the callers of this function. Additionally, we'll now hold the bit
lock over the entire set of steps when going up or down, to ensure that
we remain consistent.
Ultimately this causes us to serialize the transitions between down and
up properly, and avoid changing status while we're resetting.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Although not strictly necessary, it is customary to reverse the order in
which we release locks that we acquire. This helps preserve lock
ordering during future refactors, which can help avoid potential
deadlock situations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Stop overloading the __I40EVF_IN_CRITICAL_TASK bit lock to protect the
mac_filter_list and vlan_filter_list. Instead, implement a spinlock to
protect these two lists, similar to how we protect the hash in the i40e
PF code.
Ensure that every place where we access the list uses the spinlock to
ensure consistency, and stop holding the critical section around blocks
of code which only need access to the macvlan filter lists.
This refactor helps simplify the locking behavior, and is necessary as
a future refactor to the __I40EVF_IN_CRITICAL_TASK would cause
a deadlock otherwise.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
In i40evf_reset_task we use netif_running() to determine whether or not
the device is currently up. This allows us to properly free queue memory
and shut down things before we request the hardware reset.
It turns out that we cannot be guaranteed of netif_running() returning
false until the device is fully up, as the kernel core code sets
__LINK_STATE_START prior to calling .ndo_open. Since we're not holding
the rtnl_lock(), it's possible that the driver's i40evf_open handler
function is currently being called while we're resetting.
We can't simply hold the rtnl_lock() while checking netif_running() as
this could cause a deadlock with the i40evf_open() function.
Additionally, we can't avoid the deadlock by holding the rtnl_lock()
over the whole reset path, as this essentially serializes all resets,
and can cause massive delays if we have multiple VFs on a system.
Instead, lets just check our own internal state __I40EVF_RUNNING state
field. This allows us to ensure that the state is correct and is only
set after we've finished bringing the device up.
Without this change we might free data structures about device queues
and other memory before they've been fully allocated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Display some more stats that were already being counted, to help users
understand when priority xon/xoff packets are being sent/received
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The commit e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info")
removed some ifdef CONFIG_SYSFS in net/core/dev.c, but forgot to
remove the corresponding ifdef's in include/linux/netdevice.h.
Fixes: e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The internal PHYs in the mv88e6390 switch have a temperature sensor.
It uses a different register layout to other PHY currently supported.
It also has an errata, in that some reads of the sensor result in bad
values. So a number of reads need to be made, and the average taken.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.
This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring. A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation. Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These functions were identified as ones that could be made generic and
used by multiple drivers. Most of the contents of en_rx_am.c are moved
to net_dim.c.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
More movement to help make this code more generic.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move these to newly created file to prepare to move these functions to a
library.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The use of hash-threshold instead of modulo-N makes it trivial to add
support for non-equal-cost multipath.
Instead of dividing the multipath hash function's output space equally
between the nexthops, each nexthop is assigned a region size which is
proportional to its weight.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that each nexthop stores its region boundary in the multipath hash
function's output space, we can use hash-threshold instead of modulo-N
in multipath selection.
This reduces the number of checks we need to perform during lookup, as
dead and linkdown nexthops are assigned a negative region boundary. In
addition, in contrast to modulo-N, only flows near region boundaries are
affected when a nexthop is added or removed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The hash thresholds assigned to IPv6 nexthops are in the range of
[-1, 2^31 - 1], where a negative value is assigned to nexthops that
should not be considered during multipath selection.
Therefore, in a similar fashion to IPv4, we need to use the upper
31-bits of the multipath hash for multipath selection.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before we convert IPv6 to use hash-threshold instead of modulo-N, we
first need each nexthop to store its region boundary in the hash
function's output space.
The boundary is calculated by dividing the output space equally between
the different active nexthops. That is, nexthops that are not dead or
linkdown.
The boundaries are rebalanced whenever a nexthop is added or removed to
a multipath route and whenever a nexthop becomes active or inactive.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch tries to batched used ring update during RX. This is pretty
fit for the case when guest is much faster (e.g dpdk based
backend). In this case, used ring is almost empty:
- we may get serious cache line misses/contending on both used ring
and used idx.
- at most 1 packet could be dequeued at one time, batching in guest
does not make much effect.
Update used ring in a batch can help since guest won't access the used
ring until used idx was advanced for several descriptors and since we
advance used ring for every N packets, guest will only need to access
used idx for every N packet since it can cache the used idx. To have a
better interaction for both batch dequeuing and dpdk batching,
VHOST_RX_BATCH was used as the maximum number of descriptors that
could be batched.
Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
E5-2630 connected back to back through ixgbe. Traffic were generated
on one remote ixgbe through MoonGen and measure the RX pps through
testpmd in guest when do xdp_redirect_map from local ixgbe to
tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
improvement).
One possible concern for this is the implications for TCP (especially
latency sensitive workload). Result[1] does not show obvious changes
for most of the netperf test (RR, TX, and RX). And we do get some
improvements for RX on some specific size.
Guest RX:
size/sessions/+thu%/+normalize%
64/ 1/ +2%/ +2%
64/ 2/ +2%/ -1%
64/ 4/ +1%/ +1%
64/ 8/ 0%/ 0%
256/ 1/ +6%/ -3%
256/ 2/ -3%/ +2%
256/ 4/ +11%/ +11%
256/ 8/ 0%/ 0%
512/ 1/ +4%/ 0%
512/ 2/ +2%/ +2%
512/ 4/ 0%/ -1%
512/ 8/ -8%/ -8%
1024/ 1/ -7%/ -17%
1024/ 2/ -8%/ -7%
1024/ 4/ +1%/ 0%
1024/ 8/ 0%/ 0%
2048/ 1/ +30%/ +14%
2048/ 2/ +46%/ +40%
2048/ 4/ 0%/ 0%
2048/ 8/ 0%/ 0%
4096/ 1/ +23%/ +22%
4096/ 2/ +26%/ +23%
4096/ 4/ 0%/ +1%
4096/ 8/ 0%/ 0%
16384/ 1/ -2%/ -3%
16384/ 2/ +1%/ -4%
16384/ 4/ -1%/ -3%
16384/ 8/ 0%/ -1%
65535/ 1/ +15%/ +7%
65535/ 2/ +4%/ +7%
65535/ 4/ 0%/ +1%
65535/ 8/ 0%/ 0%
TCP_RR:
size/sessions/+thu%/+normalize%
1/ 1/ 0%/ +1%
1/ 25/ +2%/ +1%
1/ 50/ +4%/ +1%
64/ 1/ 0%/ -4%
64/ 25/ +2%/ +1%
64/ 50/ 0%/ -1%
256/ 1/ 0%/ 0%
256/ 25/ 0%/ 0%
256/ 50/ +4%/ +2%
Guest TX:
size/sessions/+thu%/+normalize%
64/ 1/ +4%/ -2%
64/ 2/ -6%/ -5%
64/ 4/ +3%/ +6%
64/ 8/ 0%/ +3%
256/ 1/ +15%/ +16%
256/ 2/ +11%/ +12%
256/ 4/ +1%/ 0%
256/ 8/ +5%/ +5%
512/ 1/ -1%/ -6%
512/ 2/ 0%/ -8%
512/ 4/ -2%/ +4%
512/ 8/ +6%/ +9%
1024/ 1/ +3%/ +1%
1024/ 2/ +3%/ +9%
1024/ 4/ 0%/ +7%
1024/ 8/ 0%/ +7%
2048/ 1/ +8%/ +2%
2048/ 2/ +3%/ -1%
2048/ 4/ -1%/ +11%
2048/ 8/ +3%/ +9%
4096/ 1/ +8%/ +8%
4096/ 2/ 0%/ -7%
4096/ 4/ +4%/ +4%
4096/ 8/ +2%/ +5%
16384/ 1/ -3%/ +1%
16384/ 2/ -1%/ -12%
16384/ 4/ -1%/ +5%
16384/ 8/ 0%/ +1%
65535/ 1/ 0%/ -3%
65535/ 2/ +5%/ +16%
65535/ 4/ +1%/ +2%
65535/ 8/ +1%/ -1%
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function type should be on the same line with the function
name, or it may cause display error if a patch edit the
function. There is am example following:
https://www.spinics.net/lists/netdev/msg476141.html
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 8491000754796c838a0081c267f9dd54ad2ccba3.
It is duplicate to add statistics of netdev for ethtool -S.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add entry for the Socionext Netsec controller driver and DT bindings.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver adds support for Socionext "netsec" IP Gigabit
Ethernet + PHY IP used in the Synquacer SC2A11 SoC.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|