aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-01-24mptcp: parse and emit MP_CAPABLE option according to v1 specChristoph Paasch7-46/+160
This implements MP_CAPABLE options parsing and writing according to RFC 6824 bis / RFC 8684: MPTCP v1. Local key is sent on syn/ack, and both keys are sent on 3rd ack. MP_CAPABLE messages len are updated accordingly. We need the skbuff to correctly emit the above, so we push the skbuff struct as an argument all the way from tcp code to the relevant mptcp callbacks. When processing incoming MP_CAPABLE + data, build a full blown DSS-like map info, to simplify later processing. On child socket creation, we need to record the remote key, if available. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: move from sha1 (v0) to sha256 (v1)Paolo Abeni4-59/+104
For simplicity's sake use directly sha256 primitives (and pull them as a required build dep). Add optional, boot-time self-tests for the hmac function. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: add basic kselftest for mptcpFlorian Westphal8-0/+1449
Add mptcp_connect tool: xmit two files back and forth between two processes, several net namespaces including some adding delays, losses and reordering. Wrapper script tests that data was transmitted without corruption. The "-c" command line option for mptcp_connect.sh is there for debugging: The script will use tcpdump to create one .pcap file per test case, named according to the namespaces, protocols, and connect address in use. For example, the first test case writes the capture to ns1-ns1-MPTCP-MPTCP-10.0.1.1.pcap. The stderr output from tcpdump is printed after the test completes to show tcpdump's "packets dropped by kernel" information. Also check that userspace can't create MPTCP sockets when mptcp.enabled sysctl is off. The "-b" option allows to tune/lower send buffer size. "-m mmap" can be used to test blocking io. Default is non-blocking io using read/write/poll. Will run automatically on "make kselftest". Note that the default timeout of 45 seconds is used even if there is a "settings" changing it to 450. 45 seconds should be enough in most cases but this depends on the machine running the tests. A fix to correctly read the "settings" file has been proposed upstream but not applied yet. It is not blocking the execution of these new tests but it would be nice to have it: https://patchwork.kernel.org/patch/11204935/ Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: new sysctl to control the activation per NSMatthieu Baerts4-5/+146
New MPTCP sockets will return -ENOPROTOOPT if MPTCP support is disabled for the current net namespace. We are providing here a way to control access to the feature for those that need to turn it on or off. The value of this new sysctl can be different per namespace. We can then restrict the usage of MPTCP to the selected NS. In case of serious issues with MPTCP, administrators can now easily turn MPTCP off. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: allow collapsing consecutive sendpages on the same substreamPaolo Abeni1-15/+60
If the current sendmsg() lands on the same subflow we used last, we can try to collapse the data. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: recvmsg() can drain data from multiple subflowsPaolo Abeni1-10/+168
With the previous patch in place, the msk can detect which subflow has the current map with a simple walk, let's update the main loop to always select the 'current' subflow. The exit conditions now closely mirror tcp_recvmsg() to get expected timeout and signal behavior. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: add subflow write space signalling and mptcp_pollFlorian Westphal3-0/+57
Add new SEND_SPACE flag to indicate that a subflow has enough space to accept more data for transmission. It gets cleared at the end of mptcp_sendmsg() in case ssk has run below the free watermark. It is (re-set) from the wspace callback. This allows us to use msk->flags to determine the poll mask. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Implement MPTCP receive pathMat Martineau7-5/+609
Parses incoming DSS options and populates outgoing MPTCP ACK fields. MPTCP fields are parsed from the TCP option header and placed in an skb extension, allowing the upper MPTCP layer to access MPTCP options after the skb has gone through the TCP stack. The subflow implements its own data_ready() ops, which ensures that the pending data is in sequence - according to MPTCP seq number - dropping out-of-seq skbs. The DATA_READY bit flag is set if this is the case. This allows the MPTCP socket layer to determine if more data is available without having to consult the individual subflows. It additionally validates the current mapping and propagates EoF events to the connection socket. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Write MPTCP DSS headers to outgoing data packetsMat Martineau4-6/+286
Per-packet metadata required to write the MPTCP DSS option is written to the skb_ext area. One write to the socket may contain more than one packet of data, which is copied to page fragments and mapped in to MPTCP DSS segments with size determined by the available page fragments and the maximum mapping length allowed by the MPTCP specification. If do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok - the later skbs can either have duplicated DSS mapping information or none at all, and the receiver can handle that. The current implementation uses the subflow frag cache and tcp sendpages to avoid excessive code duplication. More work is required to ensure that it works correctly under memory pressure and to support MPTCP-level retransmissions. The MPTCP DSS checksum is not yet implemented. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add setsockopt()/getsockopt() socket operationsPeter Krystad1-0/+58
set/getsockopt behaviour with multiple subflows is undefined. Therefore, for now, we return -EOPNOTSUPP unless we're in fallback mode. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add shutdown() socket operationPeter Krystad1-0/+66
Call shutdown on all subflows in use on the given socket, or on the fallback socket. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add key generation and token treePeter Krystad6-8/+428
Generate the local keys, IDSN, and token when creating a new socket. Introduce the token tree to track all tokens in use using a radix tree with the MPTCP token itself as the index. Override the rebuild_header callback in inet_connection_sock_af_ops for creating the local key on a new outgoing connection. Override the init_req callback of tcp_request_sock_ops for creating the local key on a new incoming connection. Will be used to obtain the MPTCP parent socket to handle incoming joins. Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Create SUBFLOW socket for incoming connectionsPeter Krystad1-5/+231
Add subflow_request_sock type that extends tcp_request_sock and add an is_mptcp flag to tcp_request_sock distinguish them. Override the listen() and accept() methods of the MPTCP socket proto_ops so they may act on the subflow socket. Override the conn_request() and syn_recv_sock() handlers in the inet_connection_sock to handle incoming MPTCP SYNs and the ACK to the response SYN. Add handling in tcp_output.c to add MP_CAPABLE to an outgoing SYN-ACK response for a subflow_request_sock. Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Handle MP_CAPABLE options for outgoing connectionsPeter Krystad9-24/+663
Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request, to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connection layer. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Associate MPTCP context with TCP socketPeter Krystad5-7/+275
Use ULP to associate a subflow_context structure with each TCP subflow socket. Creating these sockets requires new bind and connect functions to make sure ULP is set up immediately when the subflow sockets are created. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Handle MPTCP TCP optionsPeter Krystad7-1/+181
Add hooks to parse and format the MP_CAPABLE option. This option is handled according to MPTCP version 0 (RFC6824). MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in coordination with related code changes. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add MPTCP socket stubsMat Martineau10-0/+212
Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); The resulting socket is simply a wrapper around a single regular TCP socket, without any of the MPTCP protocol implemented over the wire. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24Merge branch 'net-bridge-add-per-vlan-state-option'David S. Miller8-32/+311
Nikolay Aleksandrov says: ==================== net: bridge: add per-vlan state option This set adds the first per-vlan option - state, which uses the new vlan infrastructure that was recently added. It gives us forwarding control on per-vlan basis. The first 3 patches prepare the vlan code to support option dumping and modification. We still compress vlan ranges which have equal options, each new option will have to add its own equality check to br_vlan_opts_eq(). The vlans are created in forwarding state by default to be backwards compatible and vlan state is considered only when the port state is forwarding (more info in patch 4). I'll send the selftest for the vlan state with the iproute2 patch-set. v2: patch 3: do full (all-vlan) notification only on vlan create/delete, otherwise use the per-vlan notifications only, rework how option change ranges are detected, add more verbose error messages when setting options and add checks if a vlan should be used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add per-vlan stateNikolay Aleksandrov6-19/+134
The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option setting supportNikolay Aleksandrov4-7/+130
This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option dumping supportNikolay Aleksandrov4-7/+48
We'll be dumping the options for the whole range if they're equal. The first range vlan will be used to extract the options. The commit doesn't change anything yet it just adds the skeleton for the support. The dump will happen when the first option is added. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: check port state before br_allowed_egressNikolay Aleksandrov1-1/+1
If we make sure that br_allowed_egress is called only when we have BR_STATE_FORWARDING state then we can avoid a test later when we add per-vlan state. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge tag 'mlx5-updates-2020-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller12-398/+805
Saeed Mahameed says: ==================== mlx5-updates-2020-01-22 This series provides updates to mlx5 driver. 1) Misc small cleanups 2) Some SW steering updates including header copy support 3) Full ethtool statistics support for E-Switch uplink representor Some refactoring was required to share the bare-metal NIC ethtool stats with the Uplink representor. On Top of this Vlad converts the ethtool stats support in E-Swtich vports representors to use the mlx5e "stats groups" infrastructure and then applied all applicable stats to the uplink representor netdev. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge branch 'Add-PHY-IDs-for-DP83825-6'David S. Miller2-4/+19
Dan Murphy says: ==================== Add PHY IDs for DP83825/6 Adding new PHY IDs for the DP83825 and DP83826 TI Ethernet PHYs to the DP83822 PHY driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: phy: DP83822: Add support for additional DP83825 devicesDan Murphy2-3/+12
Add PHY IDs for the DP83825CS, DP83825CM and the DP83825S devices to the DP83822 driver. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23phy: dp83826: Add phy IDs for DP83826N and 826NCDan Murphy2-2/+8
Add phy IDs to the DP83822 phy driver for the DP83826N and the DP83826NC devices. The register map and features are the same for basic enablement. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge branch 'net-sched-add-Flow-Queue-PIE-packet-scheduler'David S. Miller6-185/+849
Gautam Ramakrishnan says: ==================== net: sched: add Flow Queue PIE packet scheduler Flow Queue PIE packet scheduler This patch series implements the Flow Queue Proportional Integral controller Enhanced (FQ-PIE) active queue Management algorithm. It is an enhancement over the PIE algorithm. It integrates the PIE aqm with a deficit round robin scheme. FQ-PIE is implemented over the latest version of PIE which uses timestamps to calculate queue delay with an additional option of using average dequeue rate to calculate the queue delay. This patch also adds a memory limit of all the packets across all queues to a default value of 32Mb. - Patch #1 - Creates pie.h and moves all small functions and structures common to PIE and FQ-PIE here. The functions are all made inline. - Patch #2 - #8 - Addresses code formatting, indentation, comment changes and rearrangement of structure members. - Patch #9 - Refactors sch_pie.c by changing arguments to calculate_probability(), [pie_]drop_early() and pie_process_dequeue() to make it generic enough to be used by sch_fq_pie.c. These functions are exported to be used by sch_fq_pie.c. - Patch #10 - Adds the FQ-PIE Qdisc. For more information: https://tools.ietf.org/html/rfc8033 Changes from v6 to v7 - Call tcf_block_put() when destroying the Qdisc as suggested by Jakub Kicinski. Changes from v5 to v6 - Rearranged struct members according to their access pattern and to remove holes. Changes from v4 to v5 - This patch series breaks down patch 1 of v4 into separate logical commits as suggested by David Miller. Changes from v3 to v4 - Used non deprecated version of nla_parse_nested - Used SZ_32M macro - Removed an unused variable - Code cleanup All suggested by Jakub and Toke. Changes from v2 to v3 - Exported drop_early, pie_process_dequeue and calculate_probability functions from sch_pie as suggested by Stephen Hemminger. Changes from v1 ( and RFC patch) to v2 - Added timestamp to calculate queue delay as recommended by Dave Taht - Packet memory limit implemented as recommended by Toke. - Added external classifier as recommended by Toke. - Used NET_XMIT_CN instead of NET_XMIT_DROP as the return value in the fq_pie_qdisc_enqueue function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: add Flow Queue PIE packet schedulerMohit P. Tahiliani5-0/+609
Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com> Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani2-85/+97
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani1-9/+9
Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix commentingMohit P. Tahiliani1-5/+5
Fix punctuation and logical mistakes in the comments. The logical mistake was that "dequeue_rate" is no longer the default way to calculate queuing delay and is not needed. The default way to calculate queue delay was changed in commit cec2975f2b70 ("net: sched: pie: enable timestamp based delay calculation"). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: improve comments and commenting styleMohit P. Tahiliani1-27/+58
Improve the comments along with the commenting style used to describe the members of the structures and their initial values in the init functions. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange structure members and their initializationsMohit P. Tahiliani2-11/+11
Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: use u8 instead of bool in pie_varsMohit P. Tahiliani1-2/+2
Linux best practice recommends using u8 for true/false values in structures. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange macros in order of lengthMohit P. Tahiliani1-5/+5
Rearrange macros in order of length and align the values to improve readability. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: use U64_MAX to denote (2^64 - 1)Mohit P. Tahiliani1-2/+2
Use the U64_MAX macro to denote the constant (2^64 - 1). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: move common code to pie.hMohit P. Tahiliani2-85/+97
This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: convert suitable drivers to use phy_do_ioctl_runningHeiner Kallweit19-230/+20
Convert suitable drivers to use new helper phy_do_ioctl_running. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Timur Tabi <timur@kernel.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller320-1448/+7532
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-01-22 The following pull-request contains BPF updates for your *net-next* tree. We've added 92 non-merge commits during the last 16 day(s) which contain a total of 320 files changed, 7532 insertions(+), 1448 deletions(-). The main changes are: 1) function by function verification and program extensions from Alexei. 2) massive cleanup of selftests/bpf from Toke and Andrii. 3) batched bpf map operations from Brian and Yonghong. 4) tcp congestion control in bpf from Martin. 5) bulking for non-map xdp_redirect form Toke. 6) bpf_send_signal_thread helper from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-22net/mlx5e: Enable all available stats for uplink repsVlad Buslov3-33/+18
Extend stats group array of uplink representor with all stats that are available for PF in legacy mode, besides ipsec and TLS which are not supported. Don't output vport stats for uplink representor because they are already handled by 802_3 group (with different names: {tx|rx}_{bytes|packets}_phy). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5e: Create q counters on uplink representorsVlad Buslov1-2/+19
Q counters were disabled for all types of representors to prevent an issue where there is not enough resources to init q counters for 127 representor instances. Enable q counters only for uplink representors to support "rx_out_of_buffer", "rx_if_down_packets" counters in ethtool. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infraVlad Buslov1-43/+114
In order to support all of the supported stats that are available in legacy mode for switchdev uplink representors, convert rep stats infrastructure to reuse struct mlx5e_stats_grp that is already used when device is in legacy mode. Refactor rep code to use array of mlx5e_stats_grp structures (constructed using macros provided by stats infra) to fill/update stats, instead of fixed hardcoded set of values. This approach allows to easily extend representors with new stats types. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5e: IPoIB, use separate stats groupsSaeed Mahameed3-15/+51
Don't copy all of the stats groups used for mlx5e ethernet NIC profile, have a separate stats groups for IPoIB with the set of the needed stats only. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5e: Convert stats groups array to array of group pointersSaeed Mahameed4-31/+56
Convert stats groups array to array of "stats group" pointers to allow sharing and individual selection of groups per profile as illustrated in the next patches. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2020-01-22net/mlx5e: Declare stats groups via macroSaeed Mahameed3-185/+108
Introduce new macros to declare stats callbacks and groups, for better code reuse and for individual groups selection per profile which will be introduced in next patches. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2020-01-22net/mlx5e: Profile specific stats groupsSaeed Mahameed6-48/+93
Attach stats groups array to the profiles and make the stats utility functions (get_num, update, fill, fill_strings) generic and use the profile->stats_grps rather the hardcoded NIC stats groups. This will allow future extension to have per profile stats groups. In this patch mlx5e NIC and IPoIB will still share the same stats groups. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2020-01-22net/mlx5e: Move uplink rep init/cleanup code into own functionsRoi Dayan1-30/+51
Clean up the code and allows to call uplink rep init/cleanup from different location later. To be used later for a new uplink representor mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5: DR, Allow connecting flow table to a lower/same level tableYevgeny Kliteynik1-2/+5
Allow connecting SW steering source table to a lower/same level destination table. Lifting this limitation is required to support Connection Tracking. Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5: DR, Modify header copy supportHamdan Igbaria2-15/+151
Modify header supports ADD/SET and from this patch also COPY. Copy allows to copy header fields and metadata. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-22net/mlx5: DR, Modify set action limitation extensionHamdan Igbaria1-61/+165
Modify set actions are not supported on both tx and rx, added a check for that. Also refactored the code in a way that every modify action has his own functions, this needed so in the future we could add copy action more smoothly. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>