wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-05-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	408	-1836/+3573
	Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds	107	-462/+801
	Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-06	net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE	Pablo Neira Ayuso	3	-4/+22
	This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver that the frontend does not need counters, this hw stats type request never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests the driver to disable the stats, however, if the driver cannot disable counters, it bails out. TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_* except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*. Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order	Lukas Bulwahn	1	-1/+1
	Commit 9b038086f06b ("docs: networking: convert DIM to RST") added a new file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following alphabetical order. So, ./scripts/checkpatch.pl -f MAINTAINERS complains: WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order #5966: FILE: MAINTAINERS:5966: +F: lib/dim/ +F: Documentation/networking/net_dim.rst Reorder the file entries to keep MAINTAINERS nicely ordered. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'wireguard-fixes'	David S. Miller	6	-33/+72
	Jason A. Donenfeld says: ==================== wireguard fixes for 5.7-rc5 With Ubuntu and Debian having backported this into their kernels, we're finally seeing testing from places we hadn't seen prior, which is nice. With that comes more fixes: 1) The CI for PPC64 was running with extremely small stacks for 64-bit, causing spurious crashes in surprising places. 2) There's was an old leftover routing loop restriction, which no longer makes sense given the queueing architecture, and was causing problems for people who really did want nested routing. 3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused RCU stalls and other issues, reported by Wang Jian, with the fix suggested by Sultan Alsawaf. 4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by Arnd Bergmann. 5) A complicated if statement was simplified to an assignment while also making the likely/unlikely hinting more correct and simple, and increasing readability, suggested by Sultan. Patches (2) and (3) have Fixes: lines and are probably good candidates for stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing	Jason A. Donenfeld	2	-16/+12
	It's very unlikely that send will become true. It's nearly always false between 0 and 120 seconds of a session, and in most cases becomes true only between 120 and 121 seconds before becoming false again. So, unlikely(send) is clearly the right option here. What happened before was that we had this complex boolean expression with multiple likely and unlikely clauses nested. Since this is evaluated left-to-right anyway, the whole thing got converted to unlikely. So, we can clean this up to better represent what's going on. The generated code is the same. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning	Jason A. Donenfeld	1	-2/+2
	Without setting these to NULL, clang complains in certain configurations that have CONFIG_IPV6=n: In file included from drivers/net/wireguard/ratelimiter.c:223: drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning struct sk_buff skb4, skb6; ^ = NULL drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning struct ipv6hdr *hdr6; ^ We silence this warning by setting the variables to NULL as the warning suggests. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	wireguard: send/receive: cond_resched() when processing worker ringbuffers	Jason A. Donenfeld	2	-0/+6
	Users with pathological hardware reported CPU stalls on CONFIG_ PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning these workers would never terminate. That turned out not to be okay on systems without forced preemption, which Sultan observed. This commit adds a cond_resched() to the bottom of each loop iteration, so that these workers don't hog the core. Note that we don't need this on the napi poll worker, since that terminates after its budget is expended. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Reported-by: Wang Jian <larkwang@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	wireguard: socket: remove errant restriction on looping to self	Jason A. Donenfeld	2	-15/+51
	It's already possible to create two different interfaces and loop packets between them. This has always been possible with tunnels in the kernel, and isn't specific to wireguard. Therefore, the networking stack already needs to deal with that. At the very least, the packet winds up exceeding the MTU and is discarded at that point. So, since this is already something that happens, there's no need to forbid the not very exceptional case of routing a packet back to the same interface; this loop is no different than others, and we shouldn't special case it, but rather rely on generic handling of loops in general. This also makes it easier to do interesting things with wireguard such as onion routing. At the same time, we add a selftest for this, ensuring that both onion routing works and infinite routing loops do not crash the kernel. We also add a test case for wireguard interfaces nesting packets and sending traffic between each other, as well as the loop in this case too. We make sure to send some throughput-heavy traffic for this use case, to stress out any possible recursion issues with the locks around workqueues. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	wireguard: selftests: use normal kernel stack size on ppc64	Jason A. Donenfeld	1	-0/+1
	While at some point it might have made sense to be running these tests on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on 64-bit powerpc in a long time, and more interesting things that we test don't really work when we deviate from the default (16k). So, we stop pushing our luck in this commit, and return to the default instead of the minimum. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ethernet: ti: am65-cpsw-nuss: fix irqs type	Grygorii Strashko	1	-2/+3
	The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs triggering and NAPI scheduling for empty queues. It was discovered with RT-kernel. Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as IRQF_TRIGGER_HIGH. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	dsa: sja1105: dynamically allocate stats structure	Arnd Bergmann	1	-70/+74
	The addition of sja1105_port_status_ether structure into the statistics causes the frame size to go over the warning limit: drivers/net/dsa/sja1105/sja1105_ethtool.c:421:6: error: stack frame size of 1104 bytes in function 'sja1105_get_ethtool_stats' [-Werror,-Wframe-larger-than=] Use dynamic allocation to avoid this. Fixes: 336aa67bd027 ("net: dsa: sja1105: show more ethtool statistics counters for P/Q/R/S") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	ionic: Use debugfs_create_bool() to export bool	Geert Uytterhoeven	1	-2/+1
	Currently bool ionic_cq.done_color is exported using debugfs_create_u8(), which requires a cast, preventing further compiler checks. Fix this by switching to debugfs_create_bool(), and dropping the cast. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'ethtool-master-slave'	David S. Miller	10	-18/+240
	Oleksij Rempel says: ==================== provide support for PHY master/slave configuration changes v6: - use NL_SET_ERR_MSG_ATTR in ethnl_update_linkmodes - add sanity checks in the ioctl interface - use bool for ethnl_validate_master_slave_cfg() changes v5: - set MASTER_SLAVE_CFG_UNSUPPORTED as default value - send a netlink error message on validation error - more code fixes changes v4: - rename port_mode to master_slave - move validation code to net/ethtool/linkmodes.c - add UNSUPPORTED state and avoid sending unsupported fields - more formatting and naming fixes - tja11xx: support only force mode - tja11xx: mark state as unsupported changes v3: - provide separate field for config and state. - make state rejected on set - add validation changes v2: - change names. Use MASTER_PREFERRED instead of MULTIPORT - configure master/slave only on request. Default configuration can be provided by PHY or eeprom - status and configuration to the user space. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: phy: tja11xx: add support for master-slave configuration	Oleksij Rempel	1	-0/+43
	The TJA11xx PHYs have a vendor specific Master/Slave configuration bit, which is not compatible with IEEE 803.2-2018 spec for 100Base-T1 devices. So, provide a custom config_ange call back to solve this problem. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	ethtool: provide UAPI for PHY master/slave configuration.	Oleksij Rempel	9	-18/+197
	This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of auto-negotiation support, we needed to be able to configure the MASTER-SLAVE role of the port manually or from an application in user space. The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to force MASTER or SLAVE role. See IEEE 802.3-2018: 22.2.4.3.7 MASTER-SLAVE control register (Register 9) 22.2.4.3.8 MASTER-SLAVE status register (Register 10) 40.5.2 MASTER-SLAVE configuration resolution 45.2.1.185.1 MASTER-SLAVE config value (1.2100.14) 45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32) The MASTER-SLAVE role affects the clock configuration: ------------------------------------------------------------------------------- When the PHY is configured as MASTER, the PMA Transmit function shall source TX_TCLK from a local clock source. When configured as SLAVE, the PMA Transmit function shall source TX_TCLK from the clock recovered from data stream provided by MASTER. iMX6Q KSZ9031 XXX ------\ /-----------\ /------------\ \| \| \| \| \| MAC \|<----RGMII----->\| PHY Slave \|<------>\| PHY Master \| \|<--- 125 MHz ---+-<------/ \| \| \ \| ------/ \-----------/ \------------/ ^ \-TX_TCLK ------------------------------------------------------------------------------- Since some clock or link related issues are only reproducible in a specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial to provide generic (not 100BASE-T1 specific) interface to the user space for configuration flexibility and trouble shooting. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'chcr-next'	David S. Miller	2	-22/+68
	Devulapally Shiva Krishna says: ==================== Crypto/chcr: Fix issues regarding algorithm implementation in driver The following series of patches fixes the issues which came during self-tests with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled. Patch 1: Fixes gcm(aes) hang issue and rfc4106-gcm encryption issue. Patch 2: Fixes ctr, cbc, xts and rfc3686-ctr extra test failures. Patch 3: Fixes ccm(aes) extra test failures. Patch 4: Added support for 48 byte-key_len in aes_xts. Patch 5: fix for hmac(sha) extra test failure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Crypto/chcr: fix for hmac(sha) test fails	Devulapally Shiva Krishna	1	-1/+1
	The hmac(sha) test fails for a zero length source text data. For hmac(sha) minimum length of the data must be of block-size. So fix this by including the data_len for the last block. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Crypto/chcr: support for 48 byte key_len in aes-xts	Devulapally Shiva Krishna	1	-2/+25
	Added support for 48 byte key length for aes-xts. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Crypto/chcr: fix for ccm(aes) failed test	Devulapally Shiva Krishna	1	-1/+1
	The ccm(aes) test fails when req->assoclen > ~240bytes. The problem is the value assigned to auth_offset is wrong. As auth_offset is unsigned char, it can take max value as 255. So fix it by making it unsigned int. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Crypto/chcr: fix ctr, cbc, xts and rfc3686-ctr failed tests	Devulapally Shiva Krishna	2	-14/+29
	This solves the following issues observed during self test when CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is enabled. 1. Added fallback for cbc, ctr and rfc3686 if req->nbytes is zero and for xts added a fallback case if req->nbytes is not multiple of 16. 2. In case of cbc-aes, solved wrong iv update. When chcr_cipher_fallback() is called, used req->info pointer instead of reqctx->iv. 3. In cbc-aes decryption there was a wrong result. This occurs when chcr_cipher_fallback() is called from chcr_handle_cipher_resp(). In the fallback function iv(req->info) used is wrongly updated. So use the initial iv for this case. 4)In case of ctr-aes encryption observed wrong result. In adjust_ctr_overflow() there is condition which checks if ((bytes / AES_BLOCK_SIZE) > c), where c is the number of blocks which can be processed without iv overflow, but for the above bytes (req->nbytes < 32 , not a multiple of 16) this condition fails and the 2nd block is corrupted as it requires the rollover iv. So added a '=' condition in this to take care of this. 5)In rfc3686-ctr there was wrong result observed. This occurs when chcr_cipher_fallback() is called from chcr_handle_cipher_resp(). Here also copying initial_iv in init_iv pointer for handling the fallback case correctly. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Crypto/chcr: fix gcm-aes and rfc4106-gcm failed tests	Devulapally Shiva Krishna	1	-4/+12
	This patch fixes two issues observed during self tests with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled. 1. gcm(aes) hang issue , that happens during decryption. 2. rfc4106-gcm-aes-chcr encryption unexpectedly succeeded. For gcm-aes decryption , authtag is not mapped due to sg_nents_for_len(upto size: assoclen+ cryptlen - authsize). So fix it by dma_mapping authtag. Also replaced sg_nents() to sg_nents_for_len() in case of aead_dma_unmap(). For rfc4106-gcm-aes-chcr, used crypto_ipsec_check_assoclen() for checking the validity of assoclen. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'net-ipa-kill-endpoint-stop-workaround'	David S. Miller	4	-146/+6
	Alex Elder says: ==================== net: ipa: kill endpoint stop workaround It turns out that a workaround that performs a small DMA operation between retried attempts to stop a GSI channel is not needed for any supported hardware. The hardware quirk that required the extra DMA operation was fixed after IPA v3.1. So this series gets rid of that workaround code, along with some other code that was only present to support it. NOTE: This series depends on (and includes/duplicates) another patch that has already been committed in the net tree: 713b6ebb4c37 net: ipa: fix a bug in ipa_endpoint_stop() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: kill ipa_cmd_dma_task_32b_addr_add()	Alex Elder	2	-70/+0
	A recent commit removed the only use of ipa_cmd_dma_task_32b_addr_add(). This function (and the IPA immediate command it implements) is no longer needed, so get rid of it, along with all of the definitions associated with it. Isolate its removal in a commit so it can be easily added back again if needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: kill ipa_endpoint_stop()	Alex Elder	2	-23/+6
	The previous commit made ipa_endpoint_stop() be a trivial wrapper around gsi_channel_stop(). Since it no longer does anything special, just open-code it in the three places it's used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: don't retry in ipa_endpoint_stop()	Alex Elder	1	-15/+2
	The only reason ipa_endpoint_stop() had a retry loop was that the just-removed workaround required an IPA DMA command to occur between attempts. The gsi_channel_stop() call that implements the stop does its own retry loop, to cover a channel's transition from started to stop-in-progress to stopped state. Get rid of the unnecessary retry loop in ipa_endpoint_stop(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: get rid of workaround in ipa_endpoint_stop()	Alex Elder	1	-38/+1
	In ipa_endpoint_stop(), a workaround is used for IPA version 3.5.1 where a 1-byte DMA request is issued between GSI channel stop retries. It turns out that this workaround is only required for IPA versions 3.1 and 3.2, and we don't support those. So remove the call to ipa_endpoint_stop_rx_dma() in that function. That leaves that function unused, so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: fix a bug in ipa_endpoint_stop()	Alex Elder	1	-5/+2
	In ipa_endpoint_stop(), for TX endpoints we set the number of retries to 0. When we break out of the loop, retries being 0 means we return EIO rather than the value of ret (which should be 0). Fix this by using a non-zero retry count for both RX and TX channels, and just break out of the loop after calling gsi_channel_stop() for TX channels. This way only RX channels will retry, and the retry count will be non-zero at the end for TX channels (so the proper value gets returned). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 713b6ebb4c376b3fb65fdceb3b59e401c93248f9) Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'net-ipa-kill-endpoint-delay-mode-workaround'	David S. Miller	3	-43/+49
	Alex Elder says: ==================== net: ipa: kill endpoint delay mode workaround A "delay mode" feature was put in place to work around a problem where packets could passed to the modem before it was ready to handle them. That problem no longer exists, and we don't need the workaround any more so get rid of it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: remove endpoint delay mode feature	Alex Elder	3	-10/+1
	A "delay mode" feature was put in place to work around a problem that was observed during development of the upstream IPA driver. It used TX endpoint "delay mode" in order to prevent transmitting packets toward the modem before it was ready. A race condition that would explain the problem has long since been fixed, and we have concluded that the "delay mode" feature is no longer required. So get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: introduce ipa_endpoint_program_suspend()	Alex Elder	1	-26/+41
	Create a new helper function that encapsulates enabling or disabling suspend on an RX endpoint. It returns the previous state of the endpoint (true means suspend mode was enabled). Create another function that handles enabling or disabling delay mode on a TX endpoint. Delay mode does not work correctly on IPA version 4.2, so we don't currently use it (and shouldn't). We only set delay mode in one case, and although we don't expect an endpoint to already be in delay mode, it doesn't really matter if it was. So the delay function doesn't return a value. Stop issuing warnings if the previous suspend or delay mode state differs from what is expected. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: have ipa_endpoint_init_ctrl() return previous state	Alex Elder	1	-14/+14
	Change ipa_endpoint_init_ctrl() so it returns the previous state (whether suspend or delay mode was enabled) rather than indicating whether the request caused a change in state. This makes it easier to understand what's happening where called. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'net-ipa-limit-special-reset-handling'	David S. Miller	4	-23/+23
	Alex Elder says: ==================== net: ipa: limit special reset handling Some special handling done during channel reset should only be done for IPA hardare version 3.5.1. This series generalizes the meaning of a flag passed to indicate special behavior, then has the special handling be used only when appropriate. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: only reset channel twice for IPA v3.5.1	Alex Elder	1	-2/+2
	In gsi_channel_reset(), RX channels are subjected to two consecutive CHANNEL_RESET commands. This workaround should only be used for IPA version 3.5.1, and for newer hardware "can lead to unwanted behavior." Only issue the second CHANNEL_RESET command for legacy hardware. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: ipa: rename db_enable flag	Alex Elder	4	-21/+21
	In several places, a Boolean flag is used in the GSI code to indicate whether the "doorbell engine" should be enabled or not when a channel is configured. This is basically done to abstract this property from the IPA version; the GSI code doesn't otherwise "know" what the IPA hardware version is. The doorbell engine is enabled only for IPA v3.5.1, not for IPA v4.0 and later. The next patch makes another change that affects behavior during channel reset (which also involves programming the channel). It also distinguishes IPA v3.5.1 hardware from newer hardware. Rather than creating another flag whose value matches the "db_enable" value, just rename "db_enable" to be "legacy" so it can be used to signal more than just the special doorbell handling. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: dsa: Do not leave DSA master with NULL netdev_ops	Florian Fainelli	1	-1/+2
	When ndo_get_phys_port_name() for the CPU port was added we introduced an early check for when the DSA master network device in dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When we perform the teardown operation in dsa_master_ndo_teardown() we would not be checking that cpu_dp->orig_ndo_ops was successfully allocated and non-NULL initialized. With network device drivers such as virtio_net, this leads to a NPD as soon as the DSA switch hanging off of it gets torn down because we are now assigning the virtio_net device's netdev_ops a NULL pointer. Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred	Vladimir Oltean	1	-5/+3
	This was caused by a poor merge conflict resolution on my side. The "act = &cls->rule->action.entries[0];" assignment was already present in the code prior to the patch mentioned below. Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'tcp-minor-adjustments-for-low-pacing-rates'	David S. Miller	3	-25/+22
	Eric Dumazet says: ==================== tcp: minor adjustments for low pacing rates After pacing horizon addition, we have to adjust how we arm rto timer, otherwise we might freeze very low pacing rate flows. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	tcp: defer xmit timer reset in tcp_xmit_retransmit_queue()	Eric Dumazet	1	-6/+10
	As hinted in prior change ("tcp: refine tcp_pacing_delay() for very low pacing rates"), it is probably best arming the xmit timer only when all the packets have been scheduled, rather than when the head of rtx queue has been re-sent. This does matter for flows having extremely low pacing rates, since their tp->tcp_wstamp_ns could be far in the future. Note that the regular xmit path has a stronger limit in tcp_small_queue_check(), meaning it is less likely to go beyond the pacing horizon. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	tcp: refine tcp_pacing_delay() for very low pacing rates	Eric Dumazet	3	-20/+13
	With the addition of horizon feature to sch_fq, we noticed some suboptimal behavior of extremely low pacing rate TCP flows, especially when TCP is not aware of a drop happening in lower stacks. Back in commit 3f80e08f40cd ("tcp: add tcp_reset_xmit_timer() helper"), tcp_pacing_delay() was added to estimate an extra delay to add to standard rto timers. This patch removes the skb argument from this helper and tcp_reset_xmit_timer() because it makes more sense to simply consider the time at which next packet is allowed to be sent, instead of the time of whatever packet has been sent. This avoids arming RTO timer too soon and removes spurious horizon drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	arm64: dts: sdm845: add IPA iommus property	Alex Elder	1	-0/+2
	Add an "iommus" property to the IPA node in "sdm845.dtsi". It is required because there are two regions of memory the IPA accesses through an SMMU. The next few patches define and map those regions. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: stricter validation of untrusted gso packets	Willem de Bruijn	1	-2/+24
	Syzkaller again found a path to a kernel crash through bad gso input: a packet with transport header extending beyond skb_headlen(skb). Tighten validation at kernel entry: - Verify that the transport header lies within the linear section. To avoid pulling linux/tcp.h, verify just sizeof tcphdr. tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use. - Match the gso_type against the ip_proto found by the flow dissector. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	seg6: fix SRH processing to comply with RFC8754	Ahmed Abdelsalam	1	-2/+8
	The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined in RFC8754. RFC8754 (section 4.1) defines the SR source node behavior which encapsulates packets into an outer IPv6 header and SRH. The SR source node encodes the full list of Segments that defines the packet path in the SRH. Then, the first segment from list of Segments is copied into the Destination address of the outer IPv6 header and the packet is sent to the first hop in its path towards the destination. If the Segment list has only one segment, the SR source node can omit the SRH as he only segment is added in the destination address. RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not require the entire SID list to be preserved in the SRH. A reduced SRH does not contain the first segment of the related SR Policy (the first segment is the one already in the DA of the IPv6 header), and the Last Entry field is set to n-2, where n is the number of elements in the SR Policy. RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to validate the SRH (S09, S10, S11) which works for both reduced and non-reduced behaviors. This patch updates seg6_validate_srh() to validate the SRH as per RFC8754. Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'FDB-fixes-for-Felix-and-Ocelot-switches'	David S. Miller	6	-6/+16
	Vladimir Oltean says: ==================== FDB fixes for Felix and Ocelot switches This series fixes the following problems: - Dynamically learnt addresses never expiring (neither for Ocelot nor for Felix) - Half of the FDB not visible in 'bridge fdb show' (for Felix only) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms	Vladimir Oltean	1	-2/+9
	One may notice that automatically-learnt entries 'never' expire, even though the bridge configures the address age period at 300 seconds. Actually the value written to hardware corresponds to a time interval 1000 times higher than intended, i.e. 83 hours. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Faineli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	net: dsa: ocelot: the MAC table on Felix is twice as large	Vladimir Oltean	6	-4/+7
	When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC addresses would appear, sometimes they wouldn't. Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192 entries on VSC9959 (Felix), so the existing code from the Ocelot common library only dumped half of Felix's MAC table. They are both organized as a 4-way set-associative TCAM, so we just need a single variable indicating the correct number of rows. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	Merge branch 'timer-add-fsleep-for-flexible-sleeping'	David S. Miller	3	-64/+58
	Heiner Kallweit says: ==================== timer: add fsleep for flexible sleeping Sleeping for a certain amount of time requires use of different functions, depending on the time period. Documentation/timers/timers-howto.rst explains when to use which function, and also checkpatch checks for some potentially problematic cases. So let's create a helper that automatically chooses the appropriate sleep function -> fsleep(), for flexible sleeping Not sure why such a helper doesn't exist yet, or where the pitfall is, because it's a quite obvious idea. If the delay is a constant, then the compiler should be able to ensure that the new helper doesn't create overhead. If the delay is not constant, then the new helper can save some code. First user is the r8169 network driver. If nothing speaks against it, then this series could go through the netdev tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	r8169: use fsleep in polling functions	Heiner Kallweit	1	-64/+44
	Use new flexible sleep function fsleep() to merge the udelay and msleep polling functions. We can safely do this because no polling function is used in atomic context in this driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	timer: add fsleep for flexible sleeping	Heiner Kallweit	2	-0/+14
	Sleeping for a certain amount of time requires use of different functions, depending on the time period. Documentation/timers/timers-howto.rst explains when to use which function, and also checkpatch checks for some potentially problematic cases. So let's create a helper that automatically chooses the appropriate sleep function -> fsleep(), for flexible sleeping If the delay is a constant, then the compiler should be able to ensure that the new helper doesn't create overhead. If the delay is not constant, then the new helper can save some code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06	ipv6: Implement draft-ietf-6man-rfc4941bis	Fernando Gont	3	-54/+40
	Implement the upcoming rev of RFC4941 (IPv6 temporary addresses): https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09 * Reduces the default Valid Lifetime to 2 days The number of extra addresses employed when Valid Lifetime was 7 days exacerbated the stress caused on network elements/devices. Additionally, the motivation for temporary addresses is indeed privacy and reduced exposure. With a default Valid Lifetime of 7 days, an address that becomes revealed by active communication is reachable and exposed for one whole week. The only use case for a Valid Lifetime of 7 days could be some application that is expecting to have long lived connections. But if you want to have a long lived connections, you shouldn't be using a temporary address in the first place. Additionally, in the era of mobile devices, general applications should nevertheless be prepared and robust to address changes (e.g. nodes swap wifi <-> 4G, etc.) * Employs different IIDs for different prefixes To avoid network activity correlation among addresses configured for different prefixes * Uses a simpler algorithm for IID generation No need to store "history" anywhere Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>