aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/pci/controller/cadence/pcie-cadence-host.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-10-31ptp: xgbe: convert to .adjfine and adjust_by_scaled_ppmJacob Keller1-16/+4
The xgbe implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use adjust_by_scaled_ppm to calculate the new addend value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: ravb: convert to .adjfine and adjust_by_scaled_ppmJacob Keller1-12/+5
The ravb implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use the adjust_by_scaled_ppm helper function to calculate the new addend. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Sergey Shtylyov <s.shtylyov@omp.ru> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: linux-renesas-soc@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: lan743x: use diff_by_scaled_ppm in .adjfine implementationJacob Keller1-13/+6
Update the lan743x driver to use the recently added diff_by_scaled_ppm helper function. This reduces the amount of code required in lan743x_ptp.c driver file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: lan743x: remove .adjfreq implementationJacob Keller1-35/+0
The lan743x driver implements both .adjfreq and .adjfine, but the core PTP subsystem prefers .adjfine if implemented. There is no reason to carry a .adjfreq implementation, so we can remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: mlx5: convert to .adjfine and adjust_by_scaled_ppmJacob Keller1-16/+6
The mlx5 implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this to the .adjfine interface and use adjust_by_scaled_ppm for the calculation of the new mult value. Note that the mlx5_ptp_adjfreq_real_time function expects input in terms of ppb, so use the scaled_ppm_to_ppb to convert before passing to this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Shirly Ohnona <shirlyo@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Gal Pressman <gal@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Aya Levin <ayal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: mlx4: convert to .adjfine and adjust_by_scaled_ppmJacob Keller1-18/+11
The mlx4 implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use adjust_by_scaled_ppm to perform the calculation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31drivers: convert unsupported .adjfreq to .adjfineJacob Keller3-6/+6
A few PTP drivers implement a .adjfreq handler which indicates the operation is not supported. Convert all of these to .adjfine. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Vivek Thampi <vithampi@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: introduce helpers to adjust by scaled parts per millionJacob Keller6-79/+60
Many drivers implement the .adjfreq or .adjfine PTP op function with the same basic logic: 1. Determine a base frequency value 2. Multiply this by the abs() of the requested adjustment, then divide by the appropriate divisor (1 billion, or 65,536 billion). 3. Add or subtract this difference from the base frequency to calculate a new adjustment. A few drivers need the difference and direction rather than the combined new increment value. I recently converted the Intel drivers to .adjfine and the scaled parts per million (65.536 parts per billion) logic. To avoid overflow with minimal loss of precision, mul_u64_u64_div_u64 was used. The basic logic used by all of these drivers is very similar, and leads to a lot of duplicate code to perform the same task. Rather than keep this duplicate code, introduce diff_by_scaled_ppm and adjust_by_scaled_ppm. These helper functions calculate the difference or adjustment necessary based on the scaled parts per million input. The diff_by_scaled_ppm function returns true if the difference should be subtracted, and false otherwise. Update the Intel drivers to use the new helper functions. Other vendor drivers will be converted to .adjfine and this helper function in the following changes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: add missing documentation for parametersJacob Keller1-0/+7
The ptp_find_pin_unlocked function and the ptp_system_timestamp structure didn't document their parameters and fields. Fix this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31net: phy: Add driver for Motorcomm yt8521 gigabit ethernet phyFrank3-3/+1635
Add a driver for the motorcomm yt8521 gigabit ethernet phy. We have verified the driver on StarFive VisionFive development board, which is developed by Shanghai StarFive Technology Co., Ltd.. On the board, yt8521 gigabit ethernet phy works in utp mode, RGMII interface, supports 1000M/100M/10M speeds, and wol(magic package). Signed-off-by: Frank <Frank.Sae@motor-comm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31net: microchip: sparx5: kunit test: change test_callbacks and test_vctrl to staticYang Yingliang1-2/+2
test_callbacks and test_vctrl are only used in vcap_api_kunit.c now, change them to static. Fixes: 67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31net: geneve: fix array of flexible structures warningsJakub Kicinski1-1/+1
New compilers don't like flexible array of flexible structs: include/net/geneve.h:62:34: warning: array of flexible structures Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31net: hns: hnae: remove unnecessary __module_get() and module_put()Yang Yingliang1-3/+0
hnae_ae_register() is called from hns_dsaf_probe(), the refcount of module hnae has already be got in resolve_symbol() while calling the function, so the __module_get()/module_put() can be removed. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31netlink: split up copies in the ack constructionJakub Kicinski3-9/+43
Clean up the use of unsafe_memcpy() by adding a flexible array at the end of netlink message header and splitting up the header and data copies. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31drivers: net: convert to boolean for the mac_managed_pm flagDenis Kirjanov3-4/+4
Signed-off-by: Dennis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31dt-bindings: net: snps,dwmac: Document queue config subnodesSebastian Reichel1-81/+264
The queue configuration is referenced by snps,mtl-rx-config and snps,mtl-tx-config. Some in-tree DTs and the example put the referenced config nodes directly beneath the root node, but most in-tree DTs put it as child node of the dwmac node. This adds proper description for this setup, which has the advantage of validating the queue configuration node content. The example is also updated to use the sub-node style, incl. the axi bus configuration node, which got the same treatment as the queues config in 5361660af6d3 ("dt-bindings: net: snps,dwmac: Document stmmac-axi-config subnode"). Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-30net: remove unused netdev_unregistering()Juhee Kang1-5/+0
Currently, use dev->reg_state == NETREG_UNREGISTERING to check the status which is NETREG_UNREGISTERING, rather than using netdev_unregistering. Also, A helper function which is netdev_unregistering on nedevice.h is no longer used. Thus, netdev_unregistering removes from netdevice.h. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net: ipa: record and use the number of defined endpoint IDsAlex Elder2-3/+7
Define a new field in the IPA structure that records the maximum number of entries that will be used in the IPA endpoint array. Use that value rather than IPA_ENDPOINT_MAX to determine the end condition for two loops that iterate over all endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: determine the maximum endpoint IDAlex Elder2-34/+39
Each endpoint ID has an entry in the IPA endpoint array. But the size of that array is defined at compile time. Instead, rename ipa_endpoint_data_valid() to be ipa_endpoint_max() and have it return the maximum endpoint ID defined in configuration data. That function will still validate configuration data. Zero is returned on error; it's a valid endpoint ID, but we need more than one, so it can't be the maximum. The next patch makes use of the returned maximum value. Finally, rename the "initialized" mask of endpoints defined by configuration data to be "defined". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: refactor endpoint loopsAlex Elder1-6/+6
Change two functions that iterate over all endpoints to use while loops, using "endpoint_id" as the index variables in both spots. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: more completely check endpoint validityAlex Elder1-29/+36
Ensure all defined TX endpoints are in the range [0, CONS_PIPES) and defined RX endpoints are within [PROD_LOWEST, PROD_LOWEST+PROD_PIPES). Modify the way local variables are used to make the checks easier to understand. Check for each endpoint being in valid range in the loop, and drop the logical-AND check of initialized against unavailable IDs. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: no more global filtering starting with IPA v5.0Alex Elder1-24/+35
IPA v5.0 eliminates the global filter table entry. As a result, there is no need to shift the filtered endpoint bitmap when it is written to IPA local memory. Update comments to explain this. Also delete a redundant block of comments above the function. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: change an IPA v5.0 memory requirementAlex Elder1-1/+4
Don't require IPA v5.0 to have a STATS_TETHERING memory region. Downstream defines its size to 0, so it apparently is unused. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: ipa: define IPA v5.0Alex Elder1-0/+3
In preparation for adding support for IPA v5.0, define it as an understood version. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net/packet: add PACKET_FANOUT_FLAG_IGNORE_OUTGOINGWillem de Bruijn2-0/+2
Extend packet socket option PACKET_IGNORE_OUTGOING to fanout groups. The socket option sets ptype.ignore_outgoing, which makes dev_queue_xmit_nit skip the socket. When the socket joins a fanout group, the option is not reflected in the struct ptype of the group. dev_queue_xmit_nit only tests the fanout ptype, so the flag is ignored once a socket joins a fanout group. Inheriting the option from a socket would change established behavior. Different sockets in the group can set different flags, and can also change them at runtime. Testing in packet_rcv_fanout defeats the purpose of the original patch, which is to avoid skb_clone in dev_queue_xmit_nit (esp. for MSG_ZEROCOPY packets). Instead, introduce a new fanout group flag with the same behavior. Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/test_psock_fanout_ignore_outgoing.c Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221027211014.3581513-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28ice: Add additional CSR registers to ETHTOOL_GREGSLukasz Czapnik1-0/+169
In the event of a Tx hang it can be useful to read a variety of hardware registers to capture some state about why the transmit queue got stuck. Extend the ETHTOOL_GREGS dump provided by the ice driver with several CSR registers that provide such relevant information regarding the hardware Tx state. This enables capturing relevant data to enable debugging such a Tx hang. Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20221027104239.1691549-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: sfp: move field definitions along side register indexRussell King (Oracle)1-10/+17
Just as we do for the A2h enum, arrange the A0h enum to have the field definitions next to their corresponding register index. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: sfp: convert register indexes from hex to decimalRussell King (Oracle)1-90/+90
The register indexes in the standards are in decimal rather than hex, so lets specify them in decimal in the header file so we can easily cross-reference without converting between hex and decimal. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: add support for in-band 802.3z negotiationRussell King (Oracle)1-35/+42
As a result of help from Frank Wunderlich to investigate and test, we now know how to program this PCS for in-band 802.3z negotiation. Add support for this by moving the contents of the two functions into the common mtk_pcs_config() function and adding the register settings for 802.3z negotiation. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: move and correct link timer programmingRussell King (Oracle)1-5/+8
Program the link timer appropriately for the interface mode being used, using the newly introduced phylink helper that provides the nanosecond link timer interval. The intervals are 1.6ms for SGMII based protocols and 10ms for 802.3z based protocols. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: add advertisement programmingRussell King (Oracle)1-1/+12
Program the advertisement into the mtk PCS block. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: move interface speed selectionRussell King (Oracle)1-8/+10
Move the selection of the underlying interface speed to the pcs_config function, so we always program the interface speed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: move PHY power upRussell King (Oracle)1-7/+4
The PHY power up is common to both configuration paths, so move it into the parent function. We need to do this for all serdes modes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: add out of band forcing of speed and duplex in pcs_link_upRussell King (Oracle)1-11/+17
Add support for forcing the link speed and duplex setting in the pcs_link_up() method for out of band modes, which will be useful when we finish converting the pcs_config() method. Until then, we still have to force duplex for 802.3z modes to work correctly. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: convert mtk_sgmii to use regmap_update_bits()Russell King (Oracle)1-35/+26
mtk_sgmii does a lot of read-modify-write operations, for which there is a specific regmap function. Use this function instead of open-coding the operations. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: add pcs_get_state() implementationRussell King (Oracle)1-0/+15
Add a pcs_get_state() implementation which uses the advertisements to compute the resulting link modes, and BMSR contents to determine negotiation and link status. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: eliminate unnecessary error handlingRussell King (Oracle)1-12/+6
The functions called by the pcs_config() method always return zero, so there is no point trying to handle an error from these functions. Make these functions void, eliminate the "err" variable and simply return zero from the pcs_config() function itself. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: mtk_eth_soc: add definitions for PCSRussell King (Oracle)1-3/+10
As a result of help from Frank Wunderlich to investigate and test, we know a bit more about the PCS on the Mediatek platforms. Update the definitions from this investigation. This PCS appears similar, but not identical to the Lynx PCS. Although not included in this patch, but for future reference, the PHY ID registers at offset 4 read as 0x4d544950 'MTIP'. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: phylink: add phylink_get_link_timer_ns() helperRussell King (Oracle)1-0/+24
Add a helper to convert the PHY interface mode to the required link timer setting as stated by the appropriate standard. Inappropriate interface modes return an error. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (net).Thomas Gleixner16-45/+44
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).Thomas Gleixner63-274/+274
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28nfc: s3fwrn5: use devm_clk_get_optional_enabled() helperDmitry Torokhov1-14/+5
Because we enable the clock immediately after acquiring it in probe, we can combine the 2 operations and use devm_clk_get_optional_enabled() helper. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net: txgbe: Set MAC address and register netdevJiawen Wu7-7/+566
Add MAC address related operations, and register netdev. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net: txgbe: Reset hardwareJiawen Wu9-9/+434
Reset and initialize the hardware by configuring the MAC layer. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net: txgbe: Store PCI infoJiawen Wu9-10/+280
Get PCI config space info, set LAN id and check flash status. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add rcv_wnd and plb_rehash to TCP_INFOMubashir Adnan Qureshi2-0/+7
rcv_wnd can be useful to diagnose TCP performance where receiver window becomes the bottleneck. rehash reports the PLB and timeout triggered rehash attempts by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add u32 counter in tcp_sock and an SNMP counter for PLBMubashir Adnan Qureshi6-0/+9
A u32 counter is added to tcp_sock for counting the number of PLB triggered rehashes for a TCP connection. An SNMP counter is also added to count overall PLB triggered rehash events for a host. These counters are hooked up to PLB implementation for DCTCP. TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports the rehash attempts triggered due to PLB or timeouts. This gives a historical view of sustained congestion or timeouts experienced by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add support for PLB in DCTCPMubashir Adnan Qureshi1-1/+22
PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the congestion signal, PLB also uses ECN to make decisions whether to change the path or not upon sustained congestion. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add PLB functionality for TCPMubashir Adnan Qureshi4-2/+137
Congestion control algorithms track PLB state and cause the connection to trigger a path change when either of the 2 conditions is satisfied: - No packets are in flight and (# consecutive congested rounds >= sysctl_tcp_plb_idle_rehash_rounds) - (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds) A round (RTT) is marked as congested when congestion signal (ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh. In the event of RTO, PLB (via tcp_write_timeout()) triggers a path change and disables congestion-triggered path changes for random time between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec) to avoid hopping onto the "connectivity blackhole". RTO-triggered path changes can still happen during this cool-off period. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add sysctls for TCP PLB parametersMubashir Adnan Qureshi4-0/+131
PLB (Protective Load Balancing) is a host based mechanism for load balancing across switch links. It leverages congestion signals(e.g. ECN) from transport layer to randomly change the path of the connection experiencing congestion. PLB changes the path of the connection by changing the outgoing IPv6 flow label for IPv6 connections (implemented in Linux by calling sk_rethink_txhash()). Because of this implementation mechanism, PLB can currently only work for IPv6 traffic. For more information, see the SIGCOMM 2022 paper: https://doi.org/10.1145/3544216.3544226 This commit adds new sysctl knobs and sets their default values for TCP PLB. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>