Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently on dequeue() ETF only drops the first expired packet, which
causes a problem if the next packet is already expired. When this
happens, the watchdog will be configured with a time in the past, fire
straight way and the packet will finally be dropped once the dequeue()
function of the qdisc is called again.
We can save quite a few cycles and improve the overall behavior of the
qdisc if we drop all expired packets if the next packet is expired.
This should allow ETF to recover faster from bad situations. But
packet drops are still a very serious warning that the requirements
imposed on the system aren't reasonable.
This was inspired by how the implementation of hrtimers use the
rb_tree inside the kernel.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is just a refactor that will simplify the implementation of the
next patch in this series which will drop all expired packets on the
dequeue flow.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ETF's peek() operation is heavily used so use an rb_root_cached instead
and leverage rb_first_cached() which will run in O(1) instead of
O(log n).
Even if on 'timesortedlist_clear()' we could be using rb_erase(), we
choose to use rb_erase_cached(), because if in the future we allow
runtime changes to ETF parameters, and need to do a '_clear()', this
might cause some hard to debug issues.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no point in firing the qdisc watchdog if there are no future
skbs pending in the queue and the watchdog had been set previously.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we can use bpf or tcp tracepoint to conveniently trace the tcp
state transition at the run time.
So we don't need to do this stuff at the compile time anymore.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The manual states that the checksum cannot lie in the last DWORD of the
transmission, so add a basic check for this and fall back to software
checksumming the packet.
This only seems to trigger for ACK packets with no options or data to
return to the other end, and the use of the tx-alignment option makes
it more likely to happen.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the RX code to use get_unaligned_le32() instead of the combo
of memcpy and cpu_to_le32s(&var).
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
put an 8-byte command header onto the packet. It would be easier
to do one skb_push() and then copy the data in once the push is
done.
We also make the code smaller by using proper unaligned puts for
the header. This merges in the CPU to LE32 conversion as well and
makes the whole sequence easier to understand hopefully.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The smsc95xx driver already takes into account the NET_IP_ALIGN
parameter when setting up the receive packet data, which means
we do not need to worry about aligning the packets in the usbnet
driver.
Adding the EVENT_NO_IP_ALIGN means that the IPv4 header is now
passed to the ip_rcv() routine with the start on an aligned address.
Tested on Raspberry Pi B3.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for byte queue limit.
On NAPI poll, we save the total number of Tx confirmed frames/bytes
and register them with bql at the end of the poll function.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the frame consume callback signature:
* the entire FQ structure is passed to the callback instead
of just the queue index
* the NAPI structure can be easily obtained from the channel
it is associated to, so we don't need to pass it explicitly
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The DPNI object on which we build a network interface has a
certain number of {Rx, Tx, Tx confirmation} frame queues as
resources. The default hardware setup offers one queue of each
type, as well as one DPCON channel, for each core available
in the system.
There are however cases where the number of queues is greater
than the number of cores or channels. Until now, we configured
and used all the frame queues associated with a DPNI, even if it
meant assigning multiple queues of one type to the same channel.
Update the driver to only use a number of queues equal to the
number of channels, ensuring each channel will contain exactly
one Rx and one Tx confirmation queue.
>From the user viewpoint, this change is completely transparent.
Performance wise there is no impact in most scenarios. In case
the number of queues is larger than and not a multiple of the
number of channels, Rx hash distribution offers now better load
balancing between cores, which can have a positive impact on
overall system performance.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the vlan packet offloads are registered only upon 8021q module
load. However, even without this module loaded, the offloads could be
utilized, for example by openvswitch datapath. As reported by Michael,
that causes 2x to 5x performance improvement, depending on a testcase.
So move the vlan offload registrations into vlan_core and make this
available even without 8021q module loaded.
Reported-by: Michael Shteinbok <michaelsh86@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Michael Shteinbok <michaelsh86@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a missing indentation before the declaration of port. Add
it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix, the spelling of "failded" is incorrect in dev_err and
dev_warn messages. Fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the icplus driver has been fixed all PHY drivers supporting
interrupts have both callbacks (config_intr and ack_interrupt)
implemented - as it should be. Therefore phy_drv_supports_irq()
can be changed now to check for both callbacks being implemented.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace VLAN_TAG_PRESENT with single bit flag and free up
VLAN.CFI overload. Now VLAN.CFI is visible in networking stack
and can be passed around intact.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Wrap VLAN_PRESENT bit using macro like PKT_TYPE_* and CLONED_*,
as used by BPF code.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.
After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparision
Before:
1.03% [kernel] [k] eth_type_trans
After:
0.78% [kernel] [k] eth_type_trans
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
if list is NULL pointer, and the following access of list
will trigger panic, which is same as BUG_ON
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
lib/test_objagg.c: In function ‘test_delta_action_item’:
./include/linux/printk.h:308:2: warning: ‘errmsg’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Track the basic codepaths of delta handling, using objagg tracepoints.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow ERP sharing for multiple mask. Do it by properly implementing
delta_create() objagg object. Use the computed delta info for inserting
rules in A-TCAM.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Later on the same code is going to be needed for deltas as well. So push
the procedures related to increment and decrement of num_ctcam_erps
into a separate helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since two remaining users of mlxsw_afk_encode() do not specify
block ranges to work on, remove the args. Also, key/mask is always
non-NULL now, so skip the checks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No need to do key encoding again in
mlxsw_sp_acl_atcam_12kb_lkey_id_get(). Instead of that, introduce
a new helper that would just clear unused blocks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change order so it is aligned with the usual case where the "write_to"
buffer comes as the first arg.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The device requires that the master mask of each region will be
composed from a logical OR between all the unmasked bits in the region.
Currently, this is just a logical OR between all the eRPs used in the
region, but the next patch is going to introduce delta bits support
which need to be taken into account as well.
Since the eRP does not include the delta bits, pass the key pointer to
mlxsw_sp_acl_erp_master_mask_set/clear instead. Convert key->mask to
the bitmap on fly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the ERPs are tracked internally in a hashtable. Benefit from
the newly introduced objagg library and use it to track ERPs. At this
point, there is no nesting of objects done, as the delta_create callback
always returns -EOPNOTSUPP. On the way, add "mask" into ERP mask get and
set functions and struct names.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This lib tracks objects which could be of two types:
1) root object
2) nested object - with a "delta" which differentiates it from
the associated root object
The objects are tracked by a hashtable and reference-counted. User is
responsible of implementing callbacks to create/destroy root entity
related to each root object and callback to create/destroy nested object
delta.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order for this to behave as required with delta bits, change the mask
for rule with handle 103.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order for this to behave as required with delta bits, change the mask
for rule with handle 103.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__tcp_checksum_complete() is 100% same with __skb_checksum_complete()
and there is no other caller except tcp_checksum_complete().
So, just use __skb_checksum_complete() there.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added support in tc flower for filtering based on port ranges.
Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_port range 20-30
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
skip_hw action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto tcp
dst_ip 192.168.1.1
dst_port range 100-200
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
v4:
1. Added condition before setting port key.
2. Organized setting and dumping port range keys into functions
and added validation of input range.
v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.
v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
lookup.
4. Cleaned up fl_lookup function.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging csum
failures.
Sample output:
ens3: hw csum failure
dev features: 0x0000000000014b89
skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0
Note, I use pr_err() just to be consistent with the existing one.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We already have a workaround for a couple of switches whose internal
PHYs only have the Marvel OUI, but no model number. We detect such
PHYs and give them the 6390 ID as the model number. However the
mv88e6161 has two SERDES interfaces in the same address range as its
internal PHYs. These suffer from the same problem, the Marvell OUI,
but no model number. As a result, these SERDES interfaces were getting
the same PHY ID as the mv88e6390, even though they are not PHYs, and
the Marvell PHY driver was trying to drive them.
Add a special case to stop this from happen.
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When HW GRO enable, protocol stack will not do GRO again,
driver should add gro param to the skb for the protocol
stack..
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MAX_SKB_FRAGS in protocol stack is defined as:
MAX_SKB_FRAGS is 17 when PAGE_SIZE is 4K. If HW enable GRO, it may
merge small packets and the rx buffer may be more than
MAX_SKB_FRAGS. So driver will add skb chain when RX buffer num.
more than MAX_SKB_FRAGS.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support of ethtool -K to enable/disable
hardware GRO in HNS3 PF/VF driver.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "FE bit" in the description means the last description for
a packets. When HW GRO enable, HW write data to ring every
packet/buffer, there is greater probability that driver handle
with the describtion but HW still not set the "FE bit".
When drier handle the packet and HW still not set "FE bit",
driver stores skb and bd_num in rx ring, and continue to use the
skb and bd_num in next napi.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HNS3 hardware Revision B(=0x21) supports Hardware GRO feature. This
patch enables this feature in the HNS3 PF/VF driver.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move IRQ configuration for IP101A/G from config_init to config_intr
callback. Reasons:
1. This allows phylib to disable interrupts if needed.
2. Icplus was the only driver supporting interrupts w/o defining a
config_intr callback. Now we can add a phylib plausibility check
disabling interrupt mode if one of the two irq-related callbacks
isn't defined.
I don't own hardware with this PHY, and the change is based on the
datasheet for IP101A LF (which is supposed to be register-compatible
with IP101A/G). Change is compile-tested only.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a TX hang occurs, we attempt to recover by incrementally resetting.
If we're starved for CPU time, it's possible the reset doesn't actually
complete (or even fire) before another tx_timeout fires causing us to
fly through the different resets without actually doing them.
This adds a bit to set and check if a timeout recovery is already
pending and, if so, bail out of tx_timeout. The bit will get cleared at
the end of i40e_rebuild when reset is complete.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The i40e driver complains about unprivileged VFs trying to configure
promiscuous mode each time a VF reset occurs. This isn't the fault of
the poor VF driver - the PF driver itself is making the request.
To fix this, skip the privilege check if the request is to disable all
promiscuous activity. This gets rid of the bogus message, but doesn't
affect privilege checks, since we really only care if the unprivileged
VF is trying to enable promiscuous mode.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When using port VLAN, for VFs, and setting priority bits, the device
was sending out incorrect priority bits, and also setting the CFI
bit incorrectly.
To fix this, changed shift and mask bit definition for this function, to
use the correct ones.
Signed-off-by: Richard Rodriguez <richard.rodriguez@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|