Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").
DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.
Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f85d ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.
Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.
As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change merges the ixgbevf_tx_map call and the ixgbevf_tx_queue call
into a single function. In order to make room for this setting of cmd_type
and olinfo flags is done in separate functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch takes advantage of the dma buffer always being present in the
first descriptor and mapped as single. As such we can call dma_unmap_single
and don't need to check for DMA mapping in ixgbevf_clean_tx_irq().
In addition this patch makes use of the DMA API.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change makes it so that the first tx_buffer structure acts as a
central storage location for most of the info about the skb we are about
to transmit.
In addition this patch makes tx_flags part of the ixgbevf_tx_buffer struct.
This allows us to use the flags directly from the stucture and as result
removes the tx_flags parameter from some functions. Also as a cleanup
mapped_as_page is folded into tx_flags and some unused flags were removed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds counters for tx_restart_queue and tx_timeout_count.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes the Tx/Rx counters for checksum offload.
The Tx counter was never updated and the Rx counter is of limited use.
This is in effort to clean up the counters and make them consistent
with the counters shown by ixgbe.
Also this patch removes some members of the adapter structure that were
never used and shuffles others to reduce number of holes.
before:
/* size: 1568, cachelines: 25, members: 48 */
/* sum members: 1519, holes: 10, sum holes: 43 */
/* padding: 6 */
/* last cacheline: 32 bytes */
after:
/* size: 1480, cachelines: 24, members: 43 */
/* sum members: 1479, holes: 1, sum holes: 1 */
/* last cacheline: 8 bytes */
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch moves hot-path specific statistics into the ring structure.
This allows us to drop the adapter structure in some functions and should
help with performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch cleans up the code by removing the adapter structure as
parameter from multiple functions. The adapter structure was previously
being used to access the dev pointer, but this can also be done via the
ixgbevf_ring structure. This way we can drop the adapter as parameter from
these functions.
This patch also includes small cleanups in some error code paths.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1
(this change probably simply wasn't "git add"d to a53d34c3465b)
Fixes: a53d34c3465b ("net: move 6lowpan compression code to separate module")
Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rework the device ID #defines to follow the _DEV_ID convention
already established in the other Intel drivers.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow compiling DCB related files if I40E_DCB option
is supported in the kernel configuration.
DCB is disabled by default.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds capability to configure DCB on i40e network
interfaces using Intel XL710 adapter firmware APIs.
By default all VSIs are only enabled for the default traffic
class enabled by firmware for any given PF. The driver would
query the firmware for the traffic classes that are enabled for
the port and reconfigure the LAN VSI to match to the port traffic
class settings. All other VSIs are only enabled for the default
traffic class settings for now.
The driver registers and listens to firmware events that may
require change in the DCB settings. It may reconfigure the VSI
settings based on these events.
This patch exposes IEEE DCBNL interfaces for the i40e driver to
allow any application to query the DCB settings on the adapter.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Intel XL710 series of adapters support QoS as per the
IEEE 802.1 DCB (Data Center Bridging) standard.
This is supported in conjuction with:
- Enhanced Transmission Selection (ETS) - IEEE 802.1Qaz
- Priority Flow Control (PFC) - IEEE 802.1Qbb
- DCB eXchange Protocol (DCBX) - IEEE 802.1Qaz
On Intel XL710 adapters DCBX is performed by the adapter
firmware. The firmware runs DCBX in willing mode and configures
the port as per the DCB settings recommended by it's link
partner.
By default in absence of any DCBX; firmware would configure the
port with a single traffic class and all of the port bandwith
will be allocated to that traffic class.
This patch adds functions and calls to support querying and
configuring DCB using firmware APIs.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The i40e hardware was generating some inconsistent results
when using current programming methods. This refactor
fixes the inconsistencies that were preventing clean
unloads of the driver, and moves the queues for handling
flow director errors into their own hardware VSI.
This patch also implements a corrected version of the
basic ethtool add ntuple rule, which will disable
the driver's automatic flow programming. A future patch
adds remove/replay/list support for ntuple.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The FLAG_FDIR_* defines can be renamed to be more descriptive.
This patch is in preparation for the following where the fdir
code is refactored.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix more whitespace issues, including making some locals declared
in a nicer order.
Also update Copyright string printed when the driver loads.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove a workaround that is no longer necessary and implement
a better understanding of what firmware is returning in the MSI-X
vector count. This makes it so that the driver ends up with the
right amount of queues when using all available MSI-X vectors.
Change-ID: I34e60cc71dcfb1b5412f37df956fedcc49ade187
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Compile testing with higher warning levels found this complaint:
i40e_nvm.c: warning: 'checksum_local' may be used uninitialized in
this function
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1
(this change probably simply wasn't "git add"d to a53d34c3465b)
Fixes: a53d34c3465b ("net: move 6lowpan compression code to separate module")
Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit ab7db91705e9 ("virtio-net: auto-tune mergeable rx buffer size for
improved performance") introduced a virtio-net dependency on EWMA.
The inclusion of EWMA is controlled by CONFIG_AVERAGE. Fix build error
when CONFIG_AVERAGE is not enabled by adding select AVERAGE to
virtio-net's Kconfig entry.
Build failure reported using config make ARCH=s390 defconfig.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bump the version number to better match functionality provided with out of
tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bump the version number to better match functionality provided with out
of tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds braces around the ixgbe_qv_lock_* calls which previously only
had braces around the if portion. Kernel style guidelines for this require
parenthesis around all conditions if they are required around one. In addition
the comment while not illegal C syntax makes the code look wrong at a cursory
glance. This patch corrects the style and adds braces so that the full if-else
block is uniform.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.
The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).
To avoid all these issues simply use unsigned divide instructions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to get correct drop monitor notifications for dropped
packets, we should call kfree_skb() instead of dev_kfree_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest. Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.
Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes. Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave. Directory only appears when
<interface> is a slave.
$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state
$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jesse Brandeburg reported that commit acaf4e70997f caused a panic
when adding a network namespace while vxlan module was present in
the system:
[<ffffffff814d0865>] vxlan_lowerdev_event+0xf5/0x100
[<ffffffff816e9e5d>] notifier_call_chain+0x4d/0x70
[<ffffffff810912be>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff810912d6>] raw_notifier_call_chain+0x16/0x20
[<ffffffff815d9610>] call_netdevice_notifiers_info+0x40/0x70
[<ffffffff815d9656>] call_netdevice_notifiers+0x16/0x20
[<ffffffff815e1bce>] register_netdevice+0x1be/0x3a0
[<ffffffff815e1dce>] register_netdev+0x1e/0x30
[<ffffffff814cb94a>] loopback_net_init+0x4a/0xb0
[<ffffffffa016ed6e>] ? lockd_init_net+0x6e/0xb0 [lockd]
[<ffffffff815d6bac>] ops_init+0x4c/0x150
[<ffffffff815d6d23>] setup_net+0x73/0x110
[<ffffffff815d725b>] copy_net_ns+0x7b/0x100
[<ffffffff81090e11>] create_new_namespaces+0x101/0x1b0
[<ffffffff81090f45>] copy_namespaces+0x85/0xb0
[<ffffffff810693d5>] copy_process.part.26+0x935/0x1500
[<ffffffff811d5186>] ? mntput+0x26/0x40
[<ffffffff8106a15c>] do_fork+0xbc/0x2e0
[<ffffffff811b7f2e>] ? ____fput+0xe/0x10
[<ffffffff81089c5c>] ? task_work_run+0xac/0xe0
[<ffffffff8106a406>] SyS_clone+0x16/0x20
[<ffffffff816ee689>] stub_clone+0x69/0x90
[<ffffffff816ee329>] ? system_call_fastpath+0x16/0x1b
Apparently loopback device is being registered first and thus we
receive an event notification when vxlan_net is not ready. Hence,
when we call net_generic() and request vxlan_net_id, we seem to
access garbage at that point in time. In setup_net() where we set
up a newly allocated network namespace, we traverse the list of
pernet ops ...
list_for_each_entry(ops, &pernet_list, list) {
error = ops_init(ops, net);
if (error < 0)
goto out_undo;
}
... and loopback_net_init() is invoked first here, so in the middle
of setup_net() we get this notification in vxlan. As currently we
only care about devices that unregister, move access through
net_generic() there. Fix is based on Cong Wang's proposal, but
only changes what is needed here. It sucks a bit as we only work
around the actual cure: right now it seems the only way to check if
a netns actually finished traversing all init ops would be to check
if it's part of net_namespace_list. But that I find quite expensive
each time we go through a notifier callback. Anyway, did a couple
of tests and it seems good for now.
Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 43dc4e01 Limit number of reported VFs to device
specific value It doesn't work and always returns -EBUSY because VFs are
already enabled.
ixgbe_enable_sriov()
pci_enable_sriov()
sriov_enable()
{
... ..
iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
... ...
}
pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
return -EBUSY;
...
}
So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().
V2: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because ixgbe driver limit the max number of VF
functions could be enabled to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT
and cleanup the const 63 in code.
v3: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch :
1) Remove a dst leak if DST_NOCACHE was set on dst
Fix this by holding a reference only if dst really cached.
2) Remove a lockdep warning in __tunnel_dst_set()
This was reported by Cong Wang.
3) Remove usage of a spinlock where xchg() is enough
4) Remove some spurious inline keywords.
Let compiler decide for us.
Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The r7s72100 SoC includes a fast ethernet controller.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return a boolean from sh_eth_is_gether() and refactor it as a one-liner.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RFC 3810 defines two type of messages for multicast
listeners. The "Current State Report" message, as the name
implies, refreshes the *current* state to the querier.
Since the querier sends Query messages periodically, there
is no need to retransmit the report.
On the other hand, any change should be reported immediately
using "State Change Report" messages. Since it's an event
triggered by a change and that it can be affected by packet
loss, the rfc states it should be retransmitted [RobVar] times
to make sure routers will receive timely.
Currently, we are sending "Current State Reports" after
DAD is completed. Before that, we send messages using
unspecified address (::) which should be silently discarded
by routers.
This patch changes to send "State Change Report" messages
after DAD is completed fixing the behavior to be RFC compliant
and also to pass TAHI IPv6 testsuite.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SO_MAX_PACING_RATE definition on parisc got a typo.
Its not too late to fix it, before 3.13 is official.
Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.
Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.
Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.
This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.
Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove function qlcnic_enable_eswitch which was defined
but never used in current code.
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Functions only used in one file should be static.
Found by running make namespacecheck
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is following the commit b903d324bee262 (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).
For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.
Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To ensure ewma_read() without a lock returns a valid but possibly
out of date average, modify ewma_add() by using ACCESS_ONCE to prevent
intermediate wrong values from being written to avg->internal.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads. For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.
This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.
These optimizations interact positively with recent commit
ba275241030c ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.
Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.
net-next w/ virtio_net before 2613af0ed18a (PAGE_SIZE bufs): 14642.85Gb/s
net-next (MTU-size bufs): 13170.01Gb/s
net-next + auto-tune: 14555.94Gb/s
Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
using MTU-sized buffers to about 26Gb/s using auto-tuning.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.
To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_page_frag_refill currently permits only order-0 page allocs
unless GFP_WAIT is used. Change skb_page_frag_refill to attempt
higher-order page allocations whether or not GFP_WAIT is used. If
memory cannot be allocated, the allocator will fall back to
successively smaller page allocs (down to order-0 page allocs).
This change brings skb_page_frag_refill in line with the existing
page allocation strategy employed by netdev_alloc_frag, which attempts
higher-order page allocations whether or not GFP_WAIT is set, falling
back to successively lower-order page allocations on failure. Part
of migration of virtio-net to per-receive queue page frag allocators.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.
Fixes: 2519a602c273 ('net_sched: optimize tcf_match_indev()')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Standardize the behaviour of waiting for events in TIPC recvmsg()
so that all variables of socket or port structures are protected
within socket lock, allowing the process of calling recvmsg() to
be woken up at appropriate time.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|