linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-12-03	vxlan: add support for underlay in non-default VRF	Alexis Bauvin	1	-8/+24
	Creating a VXLAN device with is underlay in the non-default VRF makes egress route lookup fail or incorrect since it will resolve in the default VRF, and ingress fail because the socket listens in the default VRF. This patch binds the underlying UDP tunnel socket to the l3mdev of the lower device of the VXLAN device. This will listen in the proper VRF and output traffic from said l3mdev, matching l3mdev routing rules and looking up the correct routing table. When the VXLAN device does not have a lower device, or the lower device is in the default VRF, the socket will not be bound to any interface, keeping the previous behaviour. The underlay l3mdev is deduced from the VXLAN lower device (IFLA_VXLAN_LINK). +----------+ +---------+ \| \| \| \| \| vrf-blue \| \| vrf-red \| \| \| \| \| +----+-----+ +----+----+ \| \| \| \| +----+-----+ +----+----+ \| \| \| \| \| br-blue \| \| br-red \| \| \| \| \| +----+-----+ +---+-+---+ \| \| \| \| +-----+ +-----+ \| \| \| +----+-----+ +------+----+ +----+----+ \| \| lower device \| \| \| \| \| eth0 \| <- - - - - - - \| vxlan-red \| \| tap-red \| (... more taps) \| \| \| \| \| \| +----------+ +-----------+ +---------+ Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	l3mdev: add function to retreive upper master	Alexis Bauvin	2	-0/+40
	Existing functions to retreive the l3mdev of a device did not walk the master chain to find the upper master. This patch adds a function to find the l3mdev, even indirect through e.g. a bridge: +----------+ \| \| \| vrf-blue \| \| \| +----+-----+ \| \| +----+-----+ \| \| \| br-blue \| \| \| +----+-----+ \| \| +----+-----+ \| \| \| eth0 \| \| \| +----------+ This will properly resolve the l3mdev of eth0 to vrf-blue. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	udp_tunnel: add config option to bind to a device	Alexis Bauvin	3	-0/+34
	UDP tunnel sockets are always opened unbound to a specific device. This patch allow the socket to be bound on a custom device, which incidentally makes UDP tunnels VRF-aware if binding to an l3mdev. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'mlxsw-fw_load_policy'	David S. Miller	8	-14/+127
	Ido Schimmel says: ==================== mlxsw: Add 'fw_load_policy' devlink parameter Shalom says: Currently, drivers do not have the ability to control the firmware loading policy and they always use their own fixed policy. This prevents drivers from running the device with a different firmware version for testing and/or debugging purposes. For example, testing a firmware bug fix. For these situations, the new devlink generic parameter, 'fw_load_policy', gives the ability to control this option and allows drivers to run with a different firmware version than required by the driver. Patch #1 adds the new parameter to devlink. The other two patches, #2 and #3, add support for this parameter in the mlxsw driver. Example: # Query the devlink parameters supported by the device $ devlink dev param show pci/0000:03:00.0: name fw_load_policy type generic values: cmode driverinit value driver # Flash new firmware using ethtool $ ethtool -f swp1 mellanox/mlxsw_spectrum-13.1703.4.mfa2 # Toggle parameter $ devlink dev param set pci/0000:03:00.0 name fw_load_policy value flash cmode driverinit # devlink reset $ devlink dev reload pci/0000:03:00.0 # Query firmware version to show changes took affect $ ethtool -i swp1 driver: mlxsw_spectrum version: 1.0 firmware-version: 13.1703.4 expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no iproute2 patches available here: https://github.com/tshalom/iproute2-next v2: * Change 'fw_version_check' to 'fw_load_policy' with values 'driver' and 'flash' (Jakub) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	mlxsw: spectrum: Load firmware version based on devlink parameter	Shalom Toledo	3	-0/+75
	Load firmware version based on 'fw_load_policy' devlink parameter. The driver supports these two options: * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0) Default, load firmware version preferred by the driver * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1) Load firmware currently stored in flash The second option, 'flash', allow the device to run with different firmware version than preferred by the driver for testing and/or debugging purposes. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	mlxsw: core: Reset firmware after flash during driver initialization	Shalom Toledo	2	-14/+29
	After flashing new firmware during the driver initialization flow (reload or not), the driver should do a firmware reset when it gets -EAGAIN in order to load the new one. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	devlink: Add 'fw_load_policy' generic parameter	Shalom Toledo	4	-0/+23
	Many drivers load the device's firmware image during the initialization flow either from the flash or from the disk. Currently this option is not controlled by the user and the driver decides from where to load the firmware image. 'fw_load_policy' gives the ability to control this option which allows the user to choose between different loading policies supported by the driver. This parameter can be useful while testing and/or debugging the device. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: phy: don't allow __set_phy_supported to add unsupported modes	Heiner Kallweit	1	-28/+14
	Currently __set_phy_supported allows to add modes w/o checking whether the PHY supports them. This is wrong, it should never add modes but only remove modes we don't want to support. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend	Nathan Chancellor	1	-1/+3
	Clang warns: drivers/net/usb/aqc111.c:1326:37: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct aqc111_wol_cfg wol_cfg = { 0 }; ^ {} 1 warning generated. Use memset to initialize the object to take compiler instrumentation out of the equation. Fixes: e58ba4544c77 ("net: usb: aqc111: Add support for wake on LAN by MAGIC packet") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	net: qualcomm: rmnet: Remove set but not used variable 'cmd'	YueHaibing	1	-2/+0
	Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c: In function 'rmnet_map_do_flow_control': drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:23:36: warning: variable 'cmd' set but not used [-Wunused-but-set-variable] struct rmnet_map_control_command *cmd; 'cmd' not used anymore now, should also be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tun: implement carrier change	Nicolas Dichtel	2	-1/+27
	The userspace may need to control the carrier state. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	net: reorder flowi_common fields to avoid holes	Paolo Abeni	1	-1/+1
	the flowi* structures are used and memsetted by server functions in critical path. Currently flowi_common has a couple of holes that we can eliminate reordering the struct fields. As a side effect, both flowi4 and flowi6 shrink by 8 bytes. Before: pahole -EC flowi_common struct flowi_common { // ... /* size: 40, cachelines: 1, members: 10 / / sum members: 32, holes: 1, sum holes: 4 / / padding: 4 / / last cacheline: 40 bytes / }; pahole -EC flowi6 struct flowi6 { // ... / size: 88, cachelines: 2, members: 6 / / padding: 4 / / last cacheline: 24 bytes / }; pahole -EC flowi4 struct flowi4 { // ... / size: 56, cachelines: 1, members: 4 / / padding: 4 / / last cacheline: 56 bytes / }; After: struct flowi_common { // ... / size: 32, cachelines: 1, members: 10 / / last cacheline: 32 bytes / }; struct flowi6 { // ... / size: 80, cachelines: 2, members: 6 / / padding: 4 / / last cacheline: 16 bytes / }; struct flowi4 { // ... / size: 48, cachelines: 1, members: 4 / / padding: 4 / / last cacheline: 48 bytes */ }; Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	Merge branch 'mlxsw-Add-VxLAN-support-with-VLAN-aware-bridges'	David S. Miller	8	-63/+1441
	Ido Schimmel says: ==================== mlxsw: Add VxLAN support with VLAN-aware bridges Commit 53e50a6ec24d ("Merge branch 'mlxsw-Add-VxLAN-support'") added mlxsw support for VxLAN when the VxLAN device was enslaved to VLAN-unaware bridges. This patchset extends mlxsw to also support VxLAN with VLAN-aware bridges. With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN that is configured as 'pvid untagged' on the corresponding bridge port. To prevent ambiguity, mlxsw forbids configurations in which the same VLAN is configured as 'pvid untagged' on multiple VxLAN devices. Patches #1-#2 add the necessary APIs in mlxsw and the bridge driver. Patches #3-#4 perform small refactoring in order to prepare mlxsw for VLAN-aware support. Patch #5 finally enables the enslavement of VxLAN devices to a VLAN-aware bridge. Among other things, it extends mlxsw to handle switchdev notifications about VLAN add / delete on a VxLAN device enslaved to an offloaded VLAN-aware bridge. Patches #6-#8 add selftests to test the new functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	selftests: forwarding: Add VxLAN test with a VLAN-aware bridge	Ido Schimmel	2	-0/+800
	The test is very similar to its VLAN-unaware counterpart (vxlan_bridge_1d.sh), but instead of using multiple VLAN-unaware bridges, a single VLAN-aware bridge is used with multiple VLANs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	selftests: mlxsw: Add a test for VxLAN configuration with a VLAN-aware bridge	Ido Schimmel	1	-1/+203
	Extend the existing VLAN-unaware tests with their VLAN-aware counterparts. This includes sanitization of invalid configuration and offload indication on the local route performing decapsulation and the FDB entries perform encapsulation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	selftests: mlxsw: Consider VLAN-aware bridges as valid	Ido Schimmel	1	-1/+1
	Previous patches add the ability to work with VLAN-aware bridges and VxLAN devices, so make sure such configuration no longer fails. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges	Ido Schimmel	3	-14/+404
	Commit 1c30d1836aeb ("mlxsw: spectrum: Enable VxLAN enslavement to bridges") enabled the enslavement of VxLAN devices to bridges that have mlxsw ports (or their upper) as slaves. This patch extends mlxsw to also support VLAN-aware bridges. The patch is similar in nature to mentioned commit, but there is one major difference. With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN that is configured as PVID and egress untagged on the bridge port. Therefore, the driver is extended to listen to VLAN configuration on VxLAN devices of interest and enable / disable NVE encapsulation on the corresponding 802.1Q FIDs. To prevent ambiguity, the driver makes sure that a given VLAN is not configured as PVID and egress untagged on multiple VxLAN devices. This sanitization takes place both when a port is enslaved to a bridge with existing VxLAN devices and when a VLAN is added to / removed from a VxLAN device of interest. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	mlxsw: spectrum_switchdev: Prepare function for VLAN-aware bridges	Ido Schimmel	3	-9/+11
	The vxlan_join() function resolves the FID on which the VNI should be set and then sets the VNI. Currently, the FID is simply resolved according to the ifindex of the bridge device to which the VxLAN device is enslaved. This works because only VLAN-unaware bridges are supported. With VLAN-aware bridges the FID would need to be resolved based on the VLAN to which the VNI is mapped to. Add the VLAN ID to the argument list of the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	mlxsw: spectrum_switchdev: Unify VxLAN leave function	Ido Schimmel	3	-38/+9
	The function mlxsw_sp_bridge_vxlan_leave() is currently split between VLAN-aware and VLAN-unaware bridges, but actually both types can use the same function. The function needs to resolve the FID that corresponds to the VxLAN device and disable NVE encapsulation on it. Instead of looking up the FID differently for VLAN-aware and VLAN-unaware bridges, we can always use the VxLAN's device VNI. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	mlxsw: spectrum_fid: Add API to lookup 802.1Q FIDs without creating them	Ido Schimmel	3	-2/+11
	In a similar fashion to commit 564c6d727aca ("mlxsw: spectrum_fid: Add APIs to lookup FID without creating it"), add a corresponding API to lookup 802.1Q FIDs. This is a prerequisite to VxLAN support with VLAN-aware bridges and will allow us to resolve a 802.1Q FID by its VLAN when an FDB entry is added on the bridge port of the VxLAN device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	net: bridge: Extend br_vlan_get_pvid() for bridge ports	Ido Schimmel	1	-1/+5
	Currently, the function only works for the bridge device itself, but subsequent patches will need to be able to query the PVID of a given bridge port, so extend the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	Merge branch 'qed-Doorbell-overflow-recovery'	David S. Miller	13	-47/+756
	Ariel Elior says: ==================== qed*: Doorbell overflow recovery Doorbell Overflow If sufficient CPU cores will send doorbells at a sufficiently high rate, they can cause an overflow in the doorbell queue block message fifo. When fill level reaches maximum, the device stops accepting all doorbells from that PF until a recovery procedure has taken place. Doorbell Overflow Recovery The recovery procedure basically means resending the last doorbell for every doorbelling entity. A doorbelling entity is anything which may send doorbells: L2 tx ring, rdma sq/rq/cq, light l2, vf l2 tx ring, spq, etc. This relies on the design assumption that all doorbells are aggregative, so last doorbell carries the information of all previous doorbells. APIs All doorbelling entities need to register with the mechanism before sending doorbells. The registration entails providing the doorbell address the entity would be using, and a virtual address where last doorbell data can be found. Typically fastpath structures already have this construct. Executing the recovery procedure Handling the attentions, iterating over all the registered entities and resending their doorbells, is all handled within qed core module. Relevance All doorbelling entities in all protocols need to register with the mechanism, via the new APIs. Technically this is quite simple (just call the API). Some protocol fastpath implementation may not have the doorbell data stored anywhere (compute it from scratch every time) and will have to add such a place. This is rare and is also better practice (save some cycles on the fastpath). Performance Penalty No performance penalty should incur as a result of this feature. If anything performance can improve by avoiding recalcualtion of doorbell data everytime doorbell is sent (in some flows). Add the database used to register doorbelling entities, and APIs for adding and deleting entries, and logic for traversing the database and doorbelling once on behalf of all entities. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qede: Register l2 queues with doorbell overflow recovery mechanism	Ariel Elior	1	-0/+9
	All L2 queues funnel through this flow, so this would cover the regular RSS queues, as well queues created for VFs, mqos queues, xdp queues, etc. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qed: Expose the doorbell overflow recovery mechanism to the protocol drivers	Ariel Elior	2	-0/+29
	Most of the doorbelling entities are outside of the core module. L2 queues, Roce queues, iscsi and fcoe all need to register. Make the APIs available for these drivers. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qed: Register light L2 queues with doorbell overflow recovery mechanism	Ariel Elior	2	-10/+21
	Light L2 queues are doorbelling entities. Modify the implementation to keep the doorbell data necessary for doorbelling in well known location instead of recomputing every time. Register the LL2 queue with doorbell recovery mechanism. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qed: Register slowpath queue doorbell with doorbell overflow recovery mechanism	Ariel Elior	2	-13/+38
	Slow path queue is a doorbelling entity. Register it with the overflow mechanism. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qed: Use the doorbell overflow recovery mechanism in case of doorbell overflow	Ariel Elior	6	-24/+280
	In case of an attention from the doorbell queue block, analyze the HW indications. In case of a doorbell overflow, execute a doorbell recovery. Since there can be spurious indications (race conditions between multiple PFs), schedule a periodic task for checking whether a doorbell overflow may have been missed. After a set time with no indications, terminate the periodic task. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	qed: Add doorbell overflow recovery mechanism	Ariel Elior	4	-0/+379
	Add the database used to register doorbelling entities, and APIs for adding and deleting entries, and logic for traversing the database and doorbelling once on behalf of all entities. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	Merge branch 'rtnetlink-avoid-a-warning-in-rtnl_newlink'	David S. Miller	1	-161/+170
	Jakub Kicinski says: ==================== rtnetlink: avoid a warning in rtnl_newlink() I've been hoping for some time that someone more competent would fix the stack frame size warning in rtnl_newlink(), but looks like I'll have to take a stab at it myself :) That's the only warning I see in most of my builds. First patch refactors away a somewhat surprising if (1) code block. Reindentation will most likely cause cherry-pick problems but OTOH rtnl_newlink() doesn't seem to be changed often, so perhaps we can risk it in the name of cleaner code? Second patch fixes the warning in simplest possible way. I was pondering if there is any more clever solution, but I can't see it.. rtnl_newlink() is quite long with a lot of possible execution paths so doing memory allocations half way through leads to very ugly results. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	rtnetlink: avoid frame size warning in rtnl_newlink()	Jakub Kicinski	1	-3/+17
	Standard kernel compilation produces the following warning: net/core/rtnetlink.c: In function ‘rtnl_newlink’: net/core/rtnetlink.c:3232:1: warning: the frame size of 1288 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ This should not really be an issue, as rtnl_newlink() stack is generally quite shallow. Fix the warning by allocating attributes with kmalloc() in a wrapper and passing it down to rtnl_newlink(), avoiding complexities on error paths. Alternatively we could kmalloc() some structure within rtnl_newlink(), slave attributes look like a good candidate. In practice it adds to already rather high complexity and length of the function. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	rtnetlink: remove a level of indentation in rtnl_newlink()	Jakub Kicinski	1	-159/+154
	rtnl_newlink() used to create VLAs based on link kind. Since commit ccf8dbcd062a ("rtnetlink: Remove VLA usage") statically sized array is created on the stack, so there is no more use for a separate code block that used to be the VLA's live range. While at it christmas tree the variables. Note that there is a goto-based retry so to be on the safe side the variables can no longer be initialized in place. It doesn't seem to matter, logically, but why make the code harder to read.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	Merge branch 'nfp-update-TX-path-to-enable-repr-offloads'	David S. Miller	8	-37/+213
	Jakub Kicinski says: ==================== nfp: update TX path to enable repr offloads This set starts with three micro optimizations to the TX path. The improvement is measurable, but below 1% of CPU utilization. Patches 4 - 9 add basic TX offloads to representor devices, like checksum offload or TSO, and remove the unnecessary TX lock and Qdisc (our representors are software constructs on top of the PF). The last 2 patches add more info to error messages - id of command which failed and exact location of incorrect TLVs, very useful for debugging. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: report more info when reconfiguration fails	Jakub Kicinski	2	-2/+9
	FW reconfiguration timeouts are a common indicator of FW trouble. To make debugging easier print requested update and control word when reconfiguration fails. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: add offset to all TLV parsing errors	Jakub Kicinski	1	-8/+8
	When troubleshooting incorrect FW capabilities it's useful to know where the faulty TLV is located. Add offset to all errors messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: add offloads on representors	Jakub Kicinski	5	-0/+143
	FW/HW can generally support the standard networking offloads on representors without any trouble. Add the ability for FW to advertise which features should be available on representors. Because representors are muxed on top of the vNIC we need to listen on feature changes of their lower devices, and update their features appropriately. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: add locking around representor changes	Jakub Kicinski	3	-0/+8
	Up until now we never needed to keep a networking locks around representors accesses, we only accessed them when device was reconfigured (under nfp pf->lock) or on fast path (under RCU). Now we want to be able to iterate over all representors during notifications, so make sure representor assignment is done under RTNL lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: run don't require Qdiscs on representor netdevs	Jakub Kicinski	1	-0/+1
	Our representors are software devices built on top of the PF vNIC, the queuing should only happen at the vNIC netdevice. Allow representors to run qdisc-less. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: run representor TX locklessly	Jakub Kicinski	1	-0/+2
	Our representors are software devices built on top of the PF vNIC, the only state they have are per-cpu stats, so make the TX run locklessly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: avoid oversized TSO headers with metadata prepend	Jakub Kicinski	1	-1/+4
	In preparation for TSO over representors make sure the port id prepend will always fit in the frame. The current max header length is 255, which is ample, so assume worst case scenario of 8 byte prepend and save ourselves the conditionals. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: correct descriptor offsets in presence of metadata	Jakub Kicinski	1	-8/+12
	The TSO-related offsets in the descriptor should not include the length of the prepended metadata. Adjust them. Note that this could not have caused issues in the past as we don't support TSO with metadata prepend as of this patch. Signed-off-by: Michael Rapson <michael.rapson@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: move queue variable init	Jakub Kicinski	1	-1/+3
	nd_q is only used at the very end of nfp_net_tx(), there is no need to initialize it early. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: move temporary variables in nfp_net_tx_complete()	Jakub Kicinski	1	-14/+17
	Move temporary variables in scope of the loop in nfp_net_tx_complete(), and add a temp for txbuf software structure. This saves us 0.2% of CPU. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	nfp: copy only the relevant part of the TX descriptor for frags	Jakub Kicinski	2	-5/+8
	Chained descriptors for fragments need to duplicate all the descriptor fields of the skb head, so we copy the descriptor and then modify the relevant fields. This is wasteful, because the top half of the descriptor will get overwritten entirely while the bottom half is not modified at all. Copy only the bottom half. This saves us 0.3% of CPU in a GSO test. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tcp: md5: add tcp_md5_needed jump label	Eric Dumazet	4	-10/+30
	Most linux hosts never setup TCP MD5 keys. We can avoid a cache line miss (accessing tp->md5ig_info) on RX and TX using a jump label. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	Merge branch 'tcp-take-a-bit-more-care-of-backlog-stress'	David S. Miller	5	-33/+123
	Eric Dumazet says: ==================== tcp: take a bit more care of backlog stress While working on the SACK compression issue Jean-Louis Dupond reported, we found that his linux box was suffering very hard from tail drops on the socket backlog queue. First patch hints the compiler about sack flows being the norm. Second patch changes non-sack code in preparation of the ack compression. Third patch fixes tcp_space() to take backlog into account. Fourth patch is attempting coalescing when a new packet must be added to the backlog queue. Cooking bigger skbs helps to keep backlog list smaller and speeds its handling when user thread finally releases the socket lock. v3: Neal/Yuchung feedback addressed : Do not aggregate if any skb has URG bit set. Do not aggregate if the skbs have different ECE/CWR bits v2: added feedback from Neal : tcp: take care of compressed acks in tcp_add_reno_sack() added : tcp: hint compiler about sack flows added : tcp: make tcp_space() aware of socket backlog ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tcp: implement coalescing on backlog queue	Eric Dumazet	3	-6/+88
	In case GRO is not as efficient as it should be or disabled, we might have a user thread trapped in __release_sock() while softirq handler flood packets up to the point we have to drop. This patch balances work done from user thread and softirq, to give more chances to __release_sock() to complete its work before new packets are added the the backlog. This also helps if we receive many ACK packets, since GRO does not aggregate them. This patch brings ~60% throughput increase on a receiver without GRO, but the spectacular gain is really on 1000x release_sock() latency reduction I have measured. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tcp: make tcp_space() aware of socket backlog	Eric Dumazet	1	-1/+1
	Jean-Louis Dupond reported poor iscsi TCP receive performance that we tracked to backlog drops. Apparently we fail to send window updates reflecting the fact that we are under stress. Note that we might lack a proper window increase when backlog is fully processed, since __release_sock() clears sk->sk_backlog.len _after_ all skbs have been processed. This should not matter in practice. If we had a significant load through socket backlog, we are in a dangerous situation. Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Tested-by: Jean-Louis Dupond<jean-louis@dupond.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tcp: take care of compressed acks in tcp_add_reno_sack()	Eric Dumazet	1	-25/+33
	Neal pointed out that non sack flows might suffer from ACK compression added in the following patch ("tcp: implement coalescing on backlog queue") Instead of tweaking tcp_add_backlog() we can take into account how many ACK were coalesced, this information will be available in skb_shinfo(skb)->gso_segs Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	tcp: hint compiler about sack flows	Eric Dumazet	1	-1/+1
	Tell the compiler that most TCP flows are using SACK these days. There is no need to add the unlikely() clause in tcp_is_reno(), the compiler is able to infer it. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30	net: Add trace events for all receive exit points	Geneviève Bastien	2	-6/+88
	Trace events are already present for the receive entry points, to indicate how the reception entered the stack. This patch adds the corresponding exit trace events that will bound the reception such that all events occurring between the entry and the exit can be considered as part of the reception context. This greatly helps for dependency and root cause analyses. Without this, it is not possible with tracepoint instrumentation to determine whether a sched_wakeup event following a netif_receive_skb event is the result of the packet reception or a simple coincidence after further processing by the thread. It is possible using other mechanisms like kretprobes, but considering the "entry" points are already present, it would be good to add the matching exit events. In addition to linking packets with wakeups, the entry/exit event pair can also be used to perform network stack latency analyses. Signed-off-by: Geneviève Bastien <gbastien@versatic.net> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Ingo Molnar <mingo@redhat.com> CC: David S. Miller <davem@davemloft.net> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side) Signed-off-by: David S. Miller <davem@davemloft.net>