aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-12-12ethtool: move to its own directoryMichal Kubecek4-2/+5
The ethtool netlink interface is going to be split into multiple files so that it will be more convenient to put all of them in a separate directory net/ethtool. Start by moving current ethtool.c with ioctl interface into this directory and renaming it to ioctl.c. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12netlink: rename nl80211_validate_nested() to nla_validate_nested()Michal Kubecek2-6/+5
Function nl80211_validate_nested() is not specific to nl80211, it's a counterpart to nla_validate_nested_deprecated() with strict validation. For consistency with other validation and parse functions, rename it to nla_validate_nested(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12rtnetlink: provide permanent hardware address in RTM_NEWLINKMichal Kubecek2-0/+6
Permanent hardware address of a network device was traditionally provided via ethtool ioctl interface but as Jiri Pirko pointed out in a review of ethtool netlink interface, rtnetlink is much more suitable for it so let's add it to the RTM_NEWLINK message. Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the permanent address is all zeros (i.e. device driver did not fill it). As permanent address is not modifiable, reject userspace requests containing IFLA_PERM_ADDRESS attribute. Note: we already provide permanent hardware address for bond slaves; unfortunately we cannot drop that attribute for backward compatibility reasons. v5 -> v6: only add the attribute if permanent address is not zero Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12Merge branch 'unix-Show-number-of-scm-files-in-fdinfo'David S. Miller4-5/+69
Kirill Tkhai says: ==================== unix: Show number of scm files in fdinfo v2: Pass correct argument to locked in patch [2/2]. Unix sockets like a block box. You never know what is pending there: there may be a file descriptor holding a mount or a block device, or there may be whole universes with namespaces, sockets with receive queues full of sockets etc. The patchset makes number of pending scm files be visible in fdinfo. This may be useful to determine, that socket should be investigated or which task should be killed to put a reference counter on a resourse. $cat /proc/[pid]/fdinfo/[unix_sk_fd] | grep scm_fds scm_fds: 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12unix: Show number of pending scm files of receive queue in fdinfoKirill Tkhai2-5/+56
Unix sockets like a block box. You never know what is stored there: there may be a file descriptor holding a mount or a block device, or there may be whole universes with namespaces, sockets with receive queues full of sockets etc. The patch adds a little debug and accounts number of files (not recursive), which is in receive queue of a unix socket. Sometimes this is useful to determine, that socket should be investigated or which task should be killed to put reference counter on a resourse. v2: Pass correct argument to lockdep Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]Kirill Tkhai2-0/+13
This adds .show_fdinfo to socket_file_ops, so protocols will be able to print their specific data in fdinfo. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11Merge branch 'vsock-add-local-transport-support'David S. Miller10-72/+243
Stefano Garzarella says: ==================== vsock: add local transport support v2: - style fixes [Dave] - removed RCU sync and changed 'the_vsock_loopback' in a global static variable [Stefan] - use G2H transport when local transport is not loaded and remote cid is VMADDR_CID_LOCAL [Stefan] - rebased on net-next v1: https://patchwork.kernel.org/cover/11251735/ This series introduces a new transport (vsock_loopback) to handle local communication. This could be useful to test vsock core itself and to allow developers to test their applications without launching a VM. Before this series, vmci and virtio transports allowed this behavior, but only in the guest. We are moving the loopback handling in a new transport, because it might be useful to provide this feature also in the host or when no H2G/G2H transports (hyperv, virtio, vmci) are loaded. The user can use the loopback with the new VMADDR_CID_LOCAL (that replaces VMADDR_CID_RESERVED) in any condition. Otherwise, if the G2H transport is loaded, it can also use the guest local CID as previously supported by vmci and virtio transports. If G2H transport is not loaded, the user can also use VMADDR_CID_HOST for local communication. Patch 1 is a cleanup to build virtio_transport_common without virtio Patch 2 adds the new VMADDR_CID_LOCAL, replacing VMADDR_CID_RESERVED Patch 3 adds a new feature flag to register a loopback transport Patch 4 adds the new vsock_loopback transport based on the loopback implementation of virtio_transport Patch 5 implements the logic to use the local transport for loopback communication Patch 6 removes the loopback from virtio_transport ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock/virtio: remove loopback handlingStefano Garzarella1-59/+2
We can remove the loopback handling from virtio_transport, because now the vsock core is able to handle local communication using the new vsock_loopback device. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock: use local transport when it is loadedStefano Garzarella1-5/+23
Now that we have a transport that can handle the local communication, we can use it when it is loaded. A socket will use the local transport (loopback) when the remote CID is: - equal to VMADDR_CID_LOCAL - or equal to transport_g2h->get_local_cid(), if transport_g2h is loaded (this allows us to keep the same behavior implemented by virtio and vmci transports) - or equal to VMADDR_CID_HOST, if transport_g2h is not loaded Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock: add vsock_loopback transportStefano Garzarella4-0/+194
This patch adds a new vsock_loopback transport to handle local communication. This transport is based on the loopback implementation of virtio_transport, so it uses the virtio_transport_common APIs to interface with the vsock core. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock: add local transport support in the vsock coreStefano Garzarella2-1/+18
This patch allows to register a transport able to handle local communication (loopback). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock: add VMADDR_CID_LOCAL definitionStefano Garzarella2-4/+6
The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not used anymore, so we can reuse it for local communication (loopback) adding the new well-know CID: VMADDR_CID_LOCAL. Cc: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11vsock/virtio_transport_common: remove unused virtio header includesStefano Garzarella1-3/+0
We can remove virtio header includes, because virtio_transport_common doesn't use virtio API, but provides common functions to interface virtio/vhost transports with the af_vsock core, and to handle the protocol. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11Merge branch 'sfp-slow-to-probe-copper'David S. Miller1-25/+66
Russell King says: ==================== Add support for slow-to-probe-PHY copper SFP modules This series, following on from the previous adding SFP+ copper support, adds support for a range of Copper SFP modules, made by a variety of companies, all of which have a Marvell 88E1111 PHY on them, but take far longer than the Marvell spec'd 15ms to start communicating on the I2C bus. Researching the Champion One 1000SFPT module reveals that TX_DISABLE is routed through a MAX1971 switching regulator and reset IC which adds a 175ms delay to releasing the 88E1111 reset. It is not known whether other modules use a similar setup, but there are a range of modules that are slow for the Marvell PHY to appear. This patch series adds support for these modules by repeatedly trying to probe the PHY for up to 600ms. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: re-attempt probing for phyRussell King1-17/+42
Some 1000BASE-T PHY modules take a while for the PHY to wake up. Retry the probe a number of times before deciding that the module has no PHY. Tested with: Sourcephotonics SPGBTXCNFC - PHY takes less than 50ms to respond. Champion One 1000SFPT - PHY takes about 200ms to respond. Mikrotik S-RJ01 - no PHY Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: error handling for phy probeRussell King1-9/+17
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: rename sm_retriesRussell King1-5/+5
Rename sm_retries as sm_fault_retries, as this is what this member is tracking. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: use a definition for the fault recovery attemptsRussell King1-3/+11
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11Merge branch 'sfp-copper-modules'David S. Miller10-155/+670
Russell King says: ==================== Add support for SFP+ copper modules This series adds support for Copper SFP+ modules with Clause 45 PHYs. Specifically the patches: 1. drop support for the probably never tested 100BASE-*X modules. 2. drop EEPROM ID from sfp_select_interface() 3. add more compliance code definitions from SFF-8024, renaming the existing definitions. 4. add module start/stop methods so phylink knows when a module is about to become active. The module start method is called after we have probed for a PHY on the module. 5. move start/stop of module PHY down into phylink using the new module start/stop methods. 6. add support for Clause 45 I2C accesses, tested with Methode DM7052. Other modules appear to use the same protocol, but slight differences, but I do not have those modules to test with. (if someone does, please holler!) 7. rearrange how we attach to PHYs so that we can support Clause 45 PHYs with indeterminant interface modes. (Clause 45 PHYs appear to like to change their PHY interface mode depending on the negotiated speed.) 8. add support for phylink to connect to a clause 45 PHY on a SFP module. 9. split the link_an_mode between the configured value and the currently selected mode value; some clause 45 PHYs have no capability to provide in-band negotiation. 10. split the link configuration on SFP module insertion in phylink so we can use it in other code paths. 11. delay MAC configuration for copper modules without a PHY to the module start method - after any module PHY has been probed. If the module has a PHY, then we setup the MAC when the PHY is detected. 12. the Broadcom 84881 PHY does not support in-band negotiation even though it uses SGMII and 2500BASE-X. Having the MAC operating with in-band negotiation enabled, even with AN bypass enabled, results in no link - Broadcom say that the host MAC must always be forced. 13. add support for the Broadcom 84881 PHY found on the Methode DM7052 module. 14. add support to SFP to probe for a Clause 45 PHY on copper SFP+ modules. v3: now bisectable! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: add support for Clause 45 PHYsRussell King1-4/+40
Some SFP+ modules have a Clause 45 PHY onboard, which is accessible via the normal I2C address. Detect 10G BASE-T PHYs which may have an accessible PHY and probe for it. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phy: add Broadcom BCM84881 PHY driverRussell King3-0/+276
Add a rudimentary Clause 45 driver for the BCM84881 PHY, found on Methode DM7052 SFPs. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: make Broadcom BCM84881 based SFPs workRussell King1-2/+16
The Broadcom BCM84881 does not appear to send the SGMII control word when operating in SGMII mode, which causes network adapters to fail to link with the PHY, or decide to operate at fixed 1G speed, even if the PHY negotiated 100M. Work around this by detecting the Broadcom BCM84881 and switch to phy mode rather than inband mode. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: delay MAC configuration for copper SFP modulesRussell King3-10/+78
Knowing whether we need to delay the MAC configuration because a module may have a PHY is useful to phylink to allow NBASE-T modules to work on systems supporting no more than 2.5G speeds. This commit allows us to delay such configuration until after the PHY has been probed by recording the parsed capabilities, and if the module may have a PHY, doing no more until the module_start() notification is called. At that point, we either have a PHY, or we don't. We move the PHY-based setup a little later, and use the PHYs support capabilities rather than the EEPROM parsed capabilities to determine whether we can support the PHY. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: split phylink_sfp_module_insert()Russell King1-19/+28
Split out the configuration step from phylink_sfp_module_insert() so we can re-use this later. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: split link_an_mode configured and current settingsRussell King1-28/+31
Split link_an_mode between the configured setting and the current operating setting. This is an important distinction to make when we need to configure PHY mode for a plugged SFP+ module that does not use in-band signalling. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: support Clause 45 PHYs on SFP+ modulesRussell King1-5/+16
Some SFP+ modules have Clause 45 PHYs embedded on them, which need a little more handling in order to ensure that they are correctly setup, as they switch the PHY link mode according to the negotiated speed. With Clause 22 PHYs, we assumed that they would operate in SGMII mode, but this assumption is now false. Adapt phylink to support Clause 45 PHYs on SFP+ modules. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: phylink: re-split __phylink_connect_phy()Russell King1-15/+24
In order to support Clause 45 PHYs on SFP+ modules, which have an indeterminant phy interface mode, we need to be able to call phylink_bringup_phy() with a different interface mode to that used when binding the PHY. Reduce __phylink_connect_phy() to an attach operation, and move the call to phylink_bringup_phy() to its call sites. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: mdio-i2c: add support for Clause 45 accessesRussell King1-8/+20
Some SFP+ modules have PHYs on them just like SFP modules do, except they are Clause 45 PHYs. The I2C protocol used to access them is modified slightly in order to send the device address and 16-bit register index. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: move phy_start()/phy_stop() to phylinkRussell King2-2/+22
Move phy_start() and phy_stop() into the module_start and module_stop notifications in phylink, rather than having them in the SFP code. This gives phylink responsibility for controlling the PHY, rather than having SFP start and stop the PHY state machine. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: add module start/stop upstream notificationsRussell King4-0/+35
When dealing with some copper modules, we can't positively know the module capabilities are until we have probed the PHY. Without the full capabilities, we may end up failing a module that we could otherwise drive with a restricted set of capabilities. An example of this would be a module with a NBASE-T PHY plugged into a host that supports phy interface modes 2500BASE-X and SGMII. The PHY supports 10GBASE-R, 5000BASE-X, 2500BASE-X, SGMII interface modes, which means a subset of the capabilities are compatible with the host. However, reading the module EEPROM leads us to believe that the module only supports ethtool link mode 10GBASE-T, which is incompatible with the host - and thus results in the module being rejected. This patch adds an extra notification which are triggered after the SFP module's PHY probe, and a corresponding notification just before the PHY is removed. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: add more extended compliance codesRussell King3-53/+93
SFF-8024 is used to define various constants re-used in several SFF SFP-related specifications. Split these constants from the enum, and rename them to indicate that they're defined by SFF-8024. Add and use updated SFF-8024 extended compliance code definitions for 10GBASE-T, 5GBASE-T and 2.5GBASE-T modules. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: derive interface mode from ethtool link modesRussell King4-11/+6
We don't need the EEPROM ID to derive the phy interface mode as we can derive it merely from the ethtool link modes. Remove the EEPROM ID argument to sfp_select_interface(). Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-11net: sfp: remove incomplete 100BASE-FX and 100BASE-LX supportRussell King2-15/+2
The 100BASE-FX and 100BASE-LX support assumes a PHY is present; this is probably an incorrect assumption. In any case, sfp_parse_support() will fail such a module. Let's stop pretending we support these modules. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10cxgb4: add support for high priority filtersShahjada Abul Husain9-86/+234
T6 has a separate region known as high priority filter region that allows classifying packets going through ULD path. So, query firmware for HPFILTER resources and enable the high priority offload filter support when it is available. Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10enetc: remove variable 'tc_max_sized_frame' set but not usedChen Wandun1-2/+1
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs: drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable] Fixes: c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10nfp: add support for TLV device statsJakub Kicinski4-7/+242
Device stats are currently hard coded in the PCI BAR0 layout. Add a ability to read them from the TLV area instead. Names for the stats are maintained by the driver, and their meaning documented. This allows us to more easily add and remove device stats. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tcp: Cleanup duplicate initialization of sk->sk_state.Kuniyuki Iwashima1-2/+0
When a TCP socket is created, sk->sk_state is initialized twice as TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is always called after the sock_init_data(), so it is not necessary to update sk->sk_state in the tcp_init_sock(). Before v2.1.8, the code of the two functions was in the inet_create(). In the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of initialization of sk->state was duplicated. Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10enetc: add software timestampingMichael Walle2-0/+3
Provide a software TX timestamp and add it to the ethtool query interface. skb_tx_timestamp() is also needed if one would like to use PHY timestamping. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10Merge branch 'tipc-introduce-variable-window-congestion-control'David S. Miller9-83/+172
Jon Maloy says: ==================== tipc: introduce variable window congestion control We improve thoughput greatly by introducing a variety of the Reno congestion control algorithm at the link level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: introduce variable window congestion controlJon Maloy9-79/+160
We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes. - We introduce hard lower and upper window limits per link, still different and configurable per bearer type. - We introduce a 'slow start theshold' variable, initially set to the maximum window size. - We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode. - In congestion avoidance mode we increment the congestion window for each window-size number of acked packets, up to a possible maximum equal to the configured maximum window. - For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2). - If the timeout handler finds that the transmit queue has not moved since the previous timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer, so that this can discover the stale situation. This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal. This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU. Suggested-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: eliminate more unnecessary nacks and retransmissionsJon Maloy1-1/+5
When we increase the link tranmsit window we often observe the following scenario: 1) A STATE message bypasses a sequence of traffic packets and arrives far ahead of those to the receiver. STATE messages contain a 'peers_nxt_snt' field to indicate which was the last packet sent from the peer. This mechanism is intended as a last resort for the receiver to detect missing packets, e.g., during very low traffic when there is no packet flow to help early loss detection. 3) The receiving link compares the 'peer_nxt_snt' field to its own 'rcv_nxt', finds that there is a gap, and immediately sends a NACK message back to the peer. 4) When this NACKs arrives at the sender, all the requested retransmissions are performed, since it is a first-time request. Just like in the scenario described in the previous commit this leads to many redundant retransmissions, with decreased throughput as a consequence. We fix this by adding two more conditions before we send a NACK in this sitution. First, the deferred queue must be empty, so we cannot assume that the potential packet loss has already been detected by other means. Second, we check the 'peers_snd_nxt' field only in probe/ probe_reply messages, thus turning this into a true mechanism of last resort as it was really meant to be. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: eliminate gap indicator from ACK messagesJon Maloy1-5/+9
When we increase the link send window we sometimes observe the following scenario: 1) A packet #N arrives out of order far ahead of a sequence of older packets which are still under way. The packet is added to the deferred queue. 2) The missing packets arrive in sequence, and for each 16th of them an ACK is sent back to the receiver, as it should be. 3) When building those ACK messages, it is checked if there is a gap between the link's 'rcv_nxt' and the first packet in the deferred queue. This is always the case until packet number #N-1 arrives, and a 'gap' indicator is added, effectively turning them into NACK messages. 4) When those NACKs arrive at the sender, all the requested retransmissions are done, since it is a first-time request. This sometimes leads to a huge amount of redundant retransmissions, causing a drop in max throughput. This problem gets worse when we in a later commit introduce variable window congestion control, since it drops the link back to 'fast recovery' much more often than necessary. We now fix this by not sending any 'gap' indicator in regular ACK messages. We already have a mechanism for sending explicit NACKs in place, and this is sufficient to keep up the packet flow. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09ppp: Adjust indentation into ppp_async_inputNathan Chancellor1-9/+9
Clang warns: ../drivers/net/ppp/ppp_async.c:877:6: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] ap->rpkt = skb; ^ ../drivers/net/ppp/ppp_async.c:875:5: note: previous statement is here if (!skb) ^ 1 warning generated. This warning occurs because there is a space before the tab on this line. Clean up this entire block's indentation so that it is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 6722e78c9005 ("[PPP]: handle misaligned accesses") Link: https://github.com/ClangBuiltLinux/linux/issues/800 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net: smc911x: Adjust indentation in smc911x_phy_configureNathan Chancellor1-1/+1
Clang warns: ../drivers/net/ethernet/smsc/smc911x.c:939:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] if (!lp->ctl_rfduplx) ^ ../drivers/net/ethernet/smsc/smc911x.c:936:2: note: previous statement is here if (lp->ctl_rspeed != 100) ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 0a0c72c9118c ("[PATCH] RE: [PATCH 1/1] net driver: Add support for SMSC LAN911x line of ethernet chips") Link: https://github.com/ClangBuiltLinux/linux/issues/796 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net: tulip: Adjust indentation in {dmfe, uli526x}_init_moduleNathan Chancellor2-5/+6
Clang warns: ../drivers/net/ethernet/dec/tulip/uli526x.c:1812:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch (mode) { ^ ../drivers/net/ethernet/dec/tulip/uli526x.c:1809:2: note: previous statement is here if (cr6set) ^ 1 warning generated. ../drivers/net/ethernet/dec/tulip/dmfe.c:2217:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch(mode) { ^ ../drivers/net/ethernet/dec/tulip/dmfe.c:2214:2: note: previous statement is here if (cr6set) ^ 1 warning generated. This warning occurs because there is a space before the tab on these lines. Remove them so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. While we are here, adjust the default block in dmfe_init_module to have a proper break between the label and assignment and add a space between the switch and opening parentheses to avoid a checkpatch warning. Fixes: e1c3e5014040 ("[PATCH] initialisation cleanup for ULI526x-net-driver") Link: https://github.com/ClangBuiltLinux/linux/issues/795 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09Merge branch 'dp83867-fix-fifo-depth'David S. Miller2-16/+58
Dan Murphy says: ==================== Fix Tx/Rx FIFO depth for DP83867 The DP83867 supports both the RGMII and SGMII modes. The Tx and Rx FIFO depths are configurable in these modes but may not applicable for both modes. When the device is configured for RGMII mode the Tx FIFO depth is applicable and for SGMII mode both Tx and Rx FIFO depth settings are applicable. When the driver was originally written only the RGMII device was available and there were no standard fifo-depth DT properties. The patchset converts the special ti,fifo-depth property to the standard tx-fifo-depth property while still allowing the ti,fifo-depth property to be set as to maintain backward compatibility. In addition to this change the rx-fifo-depth property support was added and only written when the device is configured for SGMII mode. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net: phy: dp83867: Add rx-fifo-depth and tx-fifo-depthDan Murphy1-13/+49
This code changes the TI specific ti,fifo-depth to the common tx-fifo-depth property. The tx depth is applicable for both RGMII and SGMII modes of operation. rx-fifo-depth was added as well but this is only applicable for SGMII mode. So in summary if RGMII mode write tx fifo depth only if SGMII mode write both rx and tx fifo depths If the property is not populated in the device tree then set the value to the default values. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09dt-bindings: dp83867: Convert fifo-depth to common fifo-depth and make optionalDan Murphy1-3/+9
Convert the ti,fifo-depth from a TI specific property to the common tx-fifo-depth property. Also add support for the rx-fifo-depth. These are optional properties for this device and if these are not available then the fifo depths are set to device default values. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reported-by: Adrian Bunk <bunk@kernel.org> CC: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net-tcp: Disable TCP ssthresh metrics cache by defaultKevin(Yudong) Yang5-4/+24
This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save" that disables TCP ssthresh metrics cache by default. Other parts of TCP metrics cache, e.g. rtt, cwnd, remain unchanged. As modern networks becoming more and more dynamic, TCP metrics cache today often causes more harm than benefits. For example, the same IP address is often shared by different subscribers behind NAT in residential networks. Even if the IP address is not shared by different users, caching the slow-start threshold of a previous short flow using loss-based congestion control (e.g. cubic) often causes the future longer flows of the same network path to exit slow-start prematurely with abysmal throughput. Caching ssthresh is very risky and can lead to terrible performance. Therefore it makes sense to make disabling ssthresh caching by default and opt-in for specific networks by the administrators. This practice also has worked well for several years of deployment with CUBIC congestion control at Google. Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09sctp: get netns from asoc and ep baseXin Long14-62/+49
Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns in asoc and ep base since they're created, and it will never change. It's a better way to get netns from asoc and ep base, comparing to calling sock_net(). This patch is to replace them. v1->v2: - no change. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>