aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_ipv4.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-01selftests/tls: Fix recv(MSG_PEEK) & splice() test casesVakul Garg1-10/+10
TLS test cases splice_from_pipe, send_and_splice & recv_peek_multiple_records expect to receive a given nummber of bytes and then compare them against the number of bytes which were sent. Therefore, system call recv() must not return before receiving the requested number of bytes, otherwise the subsequent memcmp() fails. This patch passes MSG_WAITALL flag to recv() so that it does not return prematurely before requested number of bytes are copied to receive buffer. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01hv_netvsc: Fix rndis_per_packet_info internal field initializationHaiyang Zhang1-0/+1
The RSC feature -- a bit field "internal" was added here with total size unchanged: struct rndis_per_packet_info { u32 size; u32 type:31; u32 internal:1; u32 ppi_offset; }; On TX path, we put rndis msg into skb head room, which is not zeroed before passing to us. We do not use the "internal" field in TX path, but it may impact older hosts which use the entire 32 bits as "type". To fix the bug, this patch sets the field "internal" to zero. Fixes: c8e4eff4675f ("hv_netvsc: Add support for LRO/RSC in the vSwitch") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: improve handling delayed workHeiner Kallweit2-15/+16
Using mod_delayed_work() allows to simplify handling delayed work and removes the need for the sync parameter in phy_trigger_machine(). Also introduce a helper phy_queue_state_machine() to encapsulate the low-level delayed work calls. No functional change intended. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: systemport: Add software counters to track reallocationsFlorian Fainelli2-0/+7
When inserting the TSB, keep track of how many times we had to do it and if there was a failure in doing so, this helps profile the driver for possibly incorrect headroom settings. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: systemport: Be drop monitor friendly while re-allocating headroomFlorian Fainelli1-1/+2
During bcm_sysport_insert_tsb() make sure we differentiate a SKB headroom re-allocation failure from the normal swap and replace path. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: systemport: Turn on offloads by defaultFlorian Fainelli1-3/+4
We can turn on the RX/TX checksum offloads by default and make sure that those are properly reflected back to e.g: stacked devices such as VLAN or DSA. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: systemport: Utilize bcm_sysport_set_features() during resume/openFlorian Fainelli1-7/+7
During driver resume and open, the HW may have lost its context/state, utilize bcm_sysport_set_features() to make sure we do restore the correct set of features that were previously configured. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: systemport: Refactor bcm_sysport_set_features()Florian Fainelli1-23/+15
In preparation for unconditionally enabling TX and RX checksum offloads, refactor bcm_sysport_set_features() a bit such that __netdev_update_features() during register_netdev() can make sure that features are correctly programmed during network device registration. Since we can now be called during register_netdev() with clocks gated, we need to temporarily turn them on/off in order to have a successful register programming. We also move the CRC forward setting read into bcm_sysport_set_features() since priv->crc_fwd matters while turning on RX checksum offload, that way we are guaranteed they are in sync in case we ever add support for NETIF_F_RXFCS at some point in the future. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net_sched: fix a crash in tc_new_tfilter()Cong Wang1-1/+3
When tcf_block_find() fails, it already rollbacks the qdisc refcnt, so its caller doesn't need to clean up this again. Avoid calling qdisc_put() again by resetting qdisc to NULL for callers. Reported-by: syzbot+37b8770e6d5a8220a039@syzkaller.appspotmail.com Fixes: e368fdb61d8e ("net: sched: use Qdisc rcu API instead of relying on rtnl lock") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01netlink: add validation function to policyJohannes Berg2-1/+30
Add the ability to have an arbitrary validation function attached to a netlink policy that doesn't already use the validation_data pointer in another way. This can be useful to validate for example the content of a binary attribute, like in nl80211 the "(information) elements", which must be valid streams of "u8 type, u8 length, u8 value[length]". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01netlink: add attribute range validation to policyJohannes Berg2-3/+136
Without further bloating the policy structs, we can overload the `validation_data' pointer with a struct of s16 min, max and use those to validate ranges in NLA_{U,S}{8,16,32,64} attributes. It may sound strange to validate NLA_U32 with a s16 max, but in many cases NLA_U32 is used for enums etc. since there's no size benefit in using a smaller attribute width anyway, due to netlink attribute alignment; in cases like that it's still useful, particularly when the attribute really transports an enum value. Doing so lets us remove quite a bit of validation code, if we can be sure that these attributes aren't used by userspace in places where they're ignored today. To achieve all this, split the 'type' field and introduce a new 'validation_type' field which indicates what further validation (beyond the validation prescribed by the type of the attribute) is done. This currently allows for no further validation (the default), as well as min, max and range checks. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add support for enable/disable flow directorJian Shen3-1/+28
This patch adds switch for flow director with ethtool command Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Remove all flow director rules when unload hns3 driverJian Shen1-0/+2
This patch removes all flow director rules when unload hns3 driver. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add reset handle for flow directorJian Shen4-0/+103
When doing reset, remove all entries in TCAM block, and keep flow director rules list. After finishing reset, restore all entries. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add support for rule query of flow directorJian Shen3-6/+264
This patch adds support for querying rule number and rule details by ethtool commands. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add support for rule add/delete for flow directorJian Shen4-2/+519
This patch adds support for add and delete rule by ethtool commands. HNS3 driver supports several flow types, include ETHER_FLOW, IP_USER_FLOW, TCP_V4_FLOW, UDP_V4_FLOW, SCTP_V4_FLOW, IPV6_USER_FLOW, TCP_V6_FLOW, UDP_V6_FLOW and SCTP_V6_FLOW. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add input key and action config support for flow directorJian Shen3-0/+456
Each flow director rule consists of input key and action. The input key is the condition for matching, includes tuples of L2/L3/L4 header. Action is the behaviour when a packet matches with the input key, such as drop the packet, or forward to a specified queue. The input key is stored in the tcam blocks, Each bit of input key can be masked. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: hns3: Add flow director initializationJian Shen5-0/+349
Flow director is a new feature supported by hardware with revision 0x21. This patch adds flow direcor initialization for each PF. It queries flow director mode and tcam resource from firmware, selects tuples used for input key. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Replace phy driver features u32 with link_mode bitmapAndrew Lunn9-39/+198
This is one step in allowing phylib to make use of link_mode bitmaps, instead of u32 for supported and advertised features. Convert the phy drivers to use bitmaps to indicates the features they support. Build bitmap equivalents of the u32 values at runtime, and have the drivers point to the appropriate bitmap. These bitmaps are shared, and we don't want a driver to modify them. So mark them __ro_after_init. Within phylib, the features bitmap is currently turned back into a u32. This will be removed once the whole of phylib, and the drivers are converted to use bitmaps. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: ethernet: xgbe: expand PHY_GBIT_FEAUTRESAndrew Lunn1-4/+6
The macro PHY_GBIT_FEAUTRES needs to change into a bitmap in order to support link_modes. Remove its use from xgde by replacing it with its definition. Probably, the current behavior is wrong. It probably should be ANDing not assigning. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add limkmode equivalents to some of the MII ethtool helpersAndrew Lunn1-0/+50
Add helpers which take a linkmode rather than a u32 ethtool for advertising settings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add helper for advertise to lcl valueAndrew Lunn8-34/+26
Add a helper to convert the local advertising to an LCL capabilities, which is then used to resolve pause flow control settings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add helper to convert MII ADV register to a linkmodeAndrew Lunn1-0/+31
The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE register in the PHY. Since this changes the state of the PHY, we need to make the same change to phydev->advertising. Add a helper which can convert the register value to a linkmode. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add phydev_info()Andrew Lunn3-7/+11
Add phydev_info() and make use of it within the phy drivers and core code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add phydev_warn()Andrew Lunn6-24/+29
Not all new style LINK_MODE bits can be converted into old style SUPPORTED bits. We need to warn when such a conversion is attempted. Add a helper for this. Convert all pr_warn() calls to phydev_warn() where possible. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Move linkmode helpers to somewhere publicAndrew Lunn4-27/+69
phylink has some useful helpers to working with linkmode bitmaps. Move them to there own header so other code can use them. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01nfp: warn on experimental TLV typesJakub Kicinski2-0/+15
Reserve two TLV types for feature development, and warn in the driver if they ever leak into production. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: nixge: Address compiler warnings when building for i386Moritz Fischer1-7/+7
Address compiler warning reported by kbuild autobuilders when building for i386 as a result of dma_addr_t size on different architectures. warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Fixes: 7e8d5755be0e ("net: nixge: Add support for 64-bit platforms") Signed-off-by: Moritz Fischer <mdf@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tcp: adjust rcv zerocopy hints based on frag sizesSoheil Hassas Yeganeh1-1/+10
When SKBs are coalesced, we can have SKBs with different frag sizes. Some with PAGE_SIZE and some not with PAGE_SIZE. Since recv_skip_hint is always set to the full SKB size, it can overestimate the amount that should be read using normal read for coalesced packets. Change the recv_skip_hint so that it only includes the first frags that are not of PAGE_SIZE. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tcp: set recv_skip_hint when tcp_inq is less than PAGE_SIZESoheil Hassas Yeganeh1-5/+9
When we have less than PAGE_SIZE of data on receive queue, we set recv_skip_hint to 0. Instead, set it to the actual number of bytes available. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tcp: start receiver buffer autotuning soonerYuchung Cheng1-1/+1
Previously receiver buffer auto-tuning starts after receiving one advertised window amount of data. After the initial receiver buffer was raised by patch a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB"), the reciver buffer may take too long to start raising. To address this issue, this patch lowers the initial bytes expected to receive roughly the expected sender's initial window. Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01ice: Change pf state behavior to protect reset pathDave Ertman4-30/+31
Currently, there is no bit, or set of bits, that protect the entirety of the reset path. If the reset is originated by the driver, then the relevant one of the following bits will be set when the reset is scheduled: __ICE_PFR_REQ __ICE_CORER_REQ __ICE_GLOBR_REQ This bit will not be cleared until after the rebuild has completed. If the reset is originated by the FW, then the first the driver knows of it will be the reception of the OICR interrupt. The __ICE_RESET_OICR_RECV bit will be set in the interrupt handler. This will also be the indicator in a SW originated reset that we have completed the pre-OICR tasks and have informed the FW that a reset was requested. To utilize these bits, change the function: ice_is_reset_recovery_pending() to be: ice_is_reset_in_progress() The new function will check all of the above bits in the pf->state and will return a true if one or more of these bits are set. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 7/7Anirudh Venkataramanan3-282/+249
This patch completes the code move out of ice_main.c The following top level functions and related dependency functions) were moved to ice_lib.c: ice_vsi_setup ice_vsi_cfg_tc The following functions were made static again: ice_vsi_setup_vector_base ice_vsi_alloc_q_vectors ice_vsi_get_qs void ice_vsi_map_rings_to_vectors ice_vsi_alloc_rings ice_vsi_set_rss_params ice_vsi_set_num_qs ice_get_free_slot ice_vsi_init ice_vsi_alloc_arrays Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 6/7Anirudh Venkataramanan3-450/+521
This patch continues the code move out of ice_main.c The following top level functions (and related dependency functions) were moved to ice_lib.c: ice_vsi_setup_vector_base ice_vsi_alloc_q_vectors ice_vsi_get_qs The following functions were made static again: ice_vsi_free_arrays ice_vsi_clear_rings Also, in this patch, the netdev and NAPI registration logic was de-coupled from the VSI creation logic (ice_vsi_setup) as for SR-IOV, while we want to create VF VSIs using ice_vsi_setup, we don't want to create netdevs. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 5/7Anirudh Venkataramanan3-132/+141
This patch continues the code move out of ice_main.c The following top level functions (and related dependency functions) were moved to ice_lib.c: ice_vsi_clear ice_vsi_close ice_vsi_free_arrays ice_vsi_map_rings_to_vectors Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 4/7Anirudh Venkataramanan3-417/+429
This patch continues the code move out of ice_main.c The following top level functions (and related dependency functions) were moved to ice_lib.c: ice_vsi_alloc_rings ice_vsi_set_rss_params ice_vsi_set_num_qs ice_get_free_slot ice_vsi_init ice_vsi_clear_rings ice_vsi_alloc_arrays Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 3/7Anirudh Venkataramanan3-386/+410
This patch continues the code move out of ice_main.c The following top level functions (and related dependency functions) were moved to ice_lib.c: ice_vsi_delete ice_free_res ice_get_res ice_is_reset_recovery_pending ice_vsi_put_qs ice_vsi_dis_irq ice_vsi_free_irq ice_vsi_free_rx_rings ice_vsi_free_tx_rings ice_msix_clean_rings Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 2/7Anirudh Venkataramanan3-519/+526
This patch continues the code move out of ice_main.c The following top level functions (and related dependency functions) were moved to ice_lib.c: ice_vsi_start_rx_rings ice_vsi_stop_rx_rings ice_vsi_stop_tx_rings ice_vsi_cfg_rxqs ice_vsi_cfg_txqs ice_vsi_cfg_msix Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01ice: Move common functions out of ice_main.c part 1/7Anirudh Venkataramanan6-315/+348
The functions that are used for PF VSI/netdev setup will also be used for SR-IOV support. To allow reuse of these functions, move these functions out of ice_main.c to ice_common.c/ice_lib.c This move is done across multiple patches. Each patch moves a few functions and may have minor adjustments. For example, a function that was previously static in ice_main.c will be made non-static temporarily in its new location to allow the driver to build cleanly. These adjustments will be removed in subsequent patches where more code is moved out of ice_main.c In this particular patch, the following functions were moved out of ice_main.c: int ice_add_mac_to_list ice_free_fltr_list ice_stat_update40 ice_stat_update32 ice_update_eth_stats ice_vsi_add_vlan ice_vsi_kill_vlan ice_vsi_manage_vlan_insertion ice_vsi_manage_vlan_stripping Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-01net/mlx5: Cache the system image guidAlaa Hleihel4-2/+14
The system image guid is a read-only field which is used by the TC offloads code to determine if two mlx5 devices belong to the same ASIC while adding flows. Read this once and save it on the core device rather than querying each time an offloaded flow is added. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Allow reporting of checksum unnecessaryOr Gerlitz4-0/+37
Currently we practically never report checksum unnecessary, because for all IP packets we take the checksum complete path. Enable non-default runs with reprorting checksum unnecessary, using an ethtool private flag. This can be useful for performance evals and other explorations. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Enable reporting checksum unnecessary also for L3 packetsOr Gerlitz1-1/+2
We can report checksum unnecessary also when the L3 checksum flag on the cqe is set and there's no L4 header. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Add ethtool control of ring params to VF representorsGavi Teitz1-0/+18
Added ethtool control to the representors for setting and querying the ring params. Signed-off-by: Gavi Teitz <gavi@mellanox.com>
2018-10-01net/mlx5e: Enable multi-queue and RSS for VF representorsGavi Teitz1-11/+129
Increased the amount of channels the representors can open to be the amount of CPUs. The default amount opened remains one. Used the standard NIC netdev functions to: * Set RSS params when building the representors' params. * Setup an indirect TIR and RQT for the representors upon initialization. * Create a TTC flow table for the representors' indirect TIR (when creating the TTC table, mlx5e_set_ttc_basic_params() is not called, in order to avoid setting the inner_ttc param, which is not needed). Added ethtool control to the representors for setting and querying the amount of open channels. Additionally, included logic in the representors' ethtool set channels handler which controls a representor's vport rx rule, so that if there is one open channel the rx rule steers traffic to the representor's direct TIR, whereas if there is more than one channel, the rx rule steers traffic to the new TTC flow table. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Expose ethtool rss key size / indirection table functionsOr Gerlitz2-2/+16
Towards enabling RSS for the vport representors, expose the functions for querying the rss hash key size and indirection table size via ethtool. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Expose function for building RSS paramsGavi Teitz2-4/+10
Towards enabling RSS for the vport representors, extract the procedure for building a device's RSS params, and expose the function. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Provide explicit directive if to create inner indirect tirsOr Gerlitz3-12/+12
Change the driver functions that deal with creating indirect tirs to get a flag telling if inner ttc is desired. A pre-step for enabling rss on the vport representors, where inner ttc is not needed. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5: E-Switch, Provide flow dest when creating vport rx ruleGavi Teitz3-7/+9
Currently the destination for the representor e-switch rx rule is a TIR number. Towards changing that to potentially be a flow table, as part of enabling RSS for representors, modify the signature of the related e-switch API to get a flow destination. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Extract creation of rep's default flow ruleGavi Teitz1-9/+16
Cleaning up the flow of the representors' rx initialization, towards enabling RSS for the representors. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Enable stateless offloads for VF representor netdevsGavi Teitz1-0/+10
Enabled checksum and TSO offloads for the representors, in order to increase their performance, which is required to increase the performance of flows that cannot be offloaded. Checksum offloads contribute to a general acceleration of all traffic (to around 150%), whereas the TSO offload contributes to a prominent acceleration of the representor's TX for traffic flows with larger than MTU sized packets (to around 200%). This is the usual case for TCP streams, as the PF, which serves as the uplink representor, and the VF representors employ GRO before forwarding the packets to the representor. GRO was enabled implicitly for the representors beforehand, and is explicitly enabled here to ensure that the representors preserve the performance boost it provides (of around 200%) when working in tandem with the TSO offload by the forwardee, which is the standard case as both the PF and the VF representors employ HW TSO. The impact of these changes can be seen in the following measurements taken on a setup of a VM over a VF, connected to OVS via the VF representor, to an external host: Before current changes: TCP Throughput [Gb/s] External host to VM ~ 10.5 VM to external host ~ 23.5 With just checksum offloads enabled: TCP Throughput [Gb/s] External host to VM ~ 14.9 VM to external host ~ 28.5 With the TSO offload also enabled: TCP Throughput [Gb/s] External host to VM ~ 30.5 Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>