aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-11-07tcp: Remove one extra ktime_get_ns() from cookie_init_timestampEric Dumazet3-6/+12
tcp_make_synack() already uses tcp_clock_ns(), and can pass the value to cookie_init_timestamp() to avoid another call to ktime_get_ns() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: Add source route tests to fib_testsDavid Ahern1-1/+51
Add tests to verify routes with source address set are deleted when source address is deleted. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07inetpeer: fix data-race in inet_putpeer / inet_putpeerEric Dumazet1-2/+10
We need to explicitely forbid read/store tearing in inet_peer_gc() and inet_putpeer(). The following syzbot report reminds us about inet_putpeer() running without a lock held. BUG: KCSAN: data-race in inet_putpeer / inet_putpeer write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 0: inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 1: inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 4b9d9be839fd ("inetpeer: remove unused list") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: phy: at803x: add missing dependency on CONFIG_REGULATORMadalin Bucur1-0/+1
Compilation fails on PPC targets as CONFIG_REGULATOR is not set and drivers/regulator/devres.c is not compiled in while functions exported there are used by drivers/net/phy/at803x.c. Here's the error log: LD .tmp_vmlinux1 drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_set_voltage_sel': drivers/net/phy/at803x.c:294: undefined reference to `.rdev_get_drvdata' drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_get_voltage_sel': drivers/net/phy/at803x.c:306: undefined reference to `.rdev_get_drvdata' drivers/net/phy/at803x.o: In function `at8031_register_regulators': drivers/net/phy/at803x.c:359: undefined reference to `.devm_regulator_register' drivers/net/phy/at803x.c:365: undefined reference to `.devm_regulator_register' drivers/net/phy/at803x.o:(.data.rel+0x0): undefined reference to `regulator_list_voltage_table' linux/Makefile:1074: recipe for target 'vmlinux' failed make[1]: *** [vmlinux] Error 1 Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding") Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07dpaa2-eth: add ethtool MAC countersIoana Ciornei6-1/+213
When a DPNI is connected to a MAC, export its associated counters. Ethtool related functions are added in dpaa2_mac for returning the number of counters, their strings and also their values. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: ethtool: add wake-on-lan callbacksMichael Walle1-0/+27
If there is an external PHY, pass the wake-on-lan request to the PHY. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: add ioctl() support for PHY-related opsMichael Walle1-1/+4
If there is an attached PHY try to handle the requested ioctl with its handler, which allows the userspace to access PHY registers, for example. This will make mii-diag and similar tools work. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: spectrum: Fix error return code in mlxsw_sp_port_module_info_init()Wei Yongjun1-1/+3
Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: 4a7f970f1240 ("mlxsw: spectrum: Replace port_to_module array with array of structs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07Merge branch 'cxgb4-add-support-for-TC-MQPRIO-Qdisc-Offload'David S. Miller15-447/+2289
Rahul Lakkireddy says: ==================== cxgb4: add support for TC-MQPRIO Qdisc Offload This series of patches add support for offloading TC-MQPRIO Qdisc to Chelsio T5/T6 NICs. Offloading QoS traffic shaping and pacing requires using Ethernet Offload (ETHOFLD) resources available on Chelsio NICs. The ETHOFLD resources are configured by firmware and taken from the resource pool shared with other Chelsio Upper Layer Drivers. Traffic flowing through ETHOFLD region requires a software netdev Tx queue (EOSW_TXQ) exposed to networking stack, and an underlying hardware Tx queue (EOHW_TXQ) used for sending packets through hardware. ETHOFLD region is addressed using EOTIDs, which are per-connection resource. Hence, EOTIDs are capable of storing only a very small number of packets in flight. To allow more connections to share the the QoS rate limiting configuration, multiple EOTIDs must be allocated to reduce packet drops. EOTIDs are 1-to-1 mapped with software EOSW_TXQ. Several software EOSW_TXQs can post packets to a single hardware EOHW_TXQ. The series is broken down as follows: Patch 1 queries firmware for maximum available traffic classes, as well as, start and maximum available indices (EOTID) into ETHOFLD region, supported by the underlying device. Patch 2 reworks queue configuration and simplifies MSI-X allocation logic in preparation for ETHOFLD queues support. Patch 3 adds skeleton for validating and configuring TC-MQPRIO Qdisc offload. Also, adds support for software EOSW_TXQs and exposes them to network stack. Updates Tx queue selection to use fallback NIC Tx path for unsupported traffic that can't go through ETHOFLD queues. Patch 4 adds support for managing hardware queues to rate limit traffic flowing through them. The queues are allocated/removed based on enabling/disabling TC-MQPRIO Qdisc offload, respectively. Patch 5 adds Tx path for traffic flowing through software EOSW_TXQ and EOHW_TXQ. Also, adds Rx path to handle Tx completions. Patch 6 updates exisiting SCHED API to configure FLOWC based QoS offload. In the existing QUEUE based rate limiting, multiple queues sharing a traffic class get the aggreagated max rate limit value. On the other hand, in FLOWC based rate limiting, multiple queues sharing a traffic class get their own individual max rate limit value. For example, if 2 queues are bound to class 0, which is rate limited to 1 Gbps, then in QUEUE based rate limiting, both the queues get the aggregate max output of 1 Gbps only. In FLOWC based rate limiting, each queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps rate limit = 2 Gbps max output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add FLOWC based QoS offloadRahul Lakkireddy7-53/+495
Rework SCHED API to allow offloading TC-MQPRIO QoS configuration. The existing QUEUE based rate limiting throttles all queues sharing a traffic class, to the specified max rate limit value. So, if multiple queues share a traffic class, then all the queues get the aggregate specified max rate limit. So, introduce the new FLOWC based rate limiting, where multiple queues can share a traffic class with each queue getting its own individual specified max rate limit. For example, if 2 queues are bound to class 0, which is rate limited to 1 Gbps, then 2 queues using QUEUE based rate limiting, get the aggregate output of 1 Gbps only. In FLOWC based rate limiting, each queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps rate limit = 2 Gbps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add Tx and Rx path for ETHOFLD trafficRahul Lakkireddy5-49/+415
Implement Tx path for traffic flowing through software EOSW_TXQ and EOHW_TXQ. Since multiple EOSW_TXQ can post packets to a single EOHW_TXQ, protect the hardware queue with necessary spinlock. Also, move common code used to generate TSO work request to a common function. Implement Rx path to handle Tx completions for successfully transmitted packets. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add ETHOFLD hardware queue supportRahul Lakkireddy9-63/+419
Add support for configuring and managing ETHOFLD hardware queues. Keep the queue count and MSI-X allocation scheme same as NIC queues. ETHOFLD hardware queues are dynamically allocated/destroyed as TC-MQPRIO Qdisc offload is enabled/disabled on the corresponding interface, respectively. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: parse and configure TC-MQPRIO offloadRahul Lakkireddy7-50/+597
Add logic for validation and configuration of TC-MQPRIO Qdisc offload. Also, add support to manage EOSW_TXQ, which have 1-to-1 mapping with EOTIDs, and expose them to network stack. Move common skb validation in Tx path to a separate function and add minimal Tx path for ETHOFLD. Update Tx queue selection to return normal NIC Txq to send traffic pattern that can't go through ETHOFLD Tx path. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: rework queue config and MSI-X allocationRahul Lakkireddy4-246/+323
Simplify queue configuration and MSI-X allocation logic. Use a single MSI-X information table for both NIC and ULDs. Remove hard-coded MSI-X indices for firmware event queue and non data interrupts. Instead, use the MSI-X bitmap to obtain a free MSI-X index dynamically. Save each Rxq's index into the MSI-X information table, within the Rxq structures themselves, for easier cleanup. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: query firmware for QoS offload resourcesRahul Lakkireddy4-8/+62
QoS offload needs Ethernet Offload (ETHOFLD) resources present in the NIC. These resources are shared with other ULDs. So, query firmware for the available number of traffic classes, as well as, start and end indices (EOTID) of the ETHOFLD region. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net_sched: gen_estimator: extend packet counter to 64bitEric Dumazet1-2/+2
I forgot to change last_packets field in struct net_rate_estimator. Without this fix, rate estimators would misbehave after more than 2^32 packets have been sent. Another solution would be to be careful and only use the 32 least significant bits of packets counters, but we have a hole in net_rate_estimator structure and this looks easier to read/maintain. Fixes: d0083d98f685 ("net_sched: extend packet counter to 64bit") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06dpaa2-ptp: fix compile errorChenwandun1-0/+1
phylink_set_port_modes will be compiled if CONFIG_PHYLINK enabled, dpaa2_mac_validate will be compiled if CONFIG_FSL_DPAA2_ETH enabled, it should select CONFIG_PHYLINK when dpaa2_mac_validate call phylink_set_port_modes drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_validate': dpaa2-mac.c:(.text+0x3a1): undefined reference to `phylink_set_port_modes' drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_connect': dpaa2-mac.c:(.text+0x91a): undefined reference to `phylink_create' dpaa2-mac.c:(.text+0x94e): undefined reference to `phylink_of_phy_connect' dpaa2-mac.c:(.text+0x97f): undefined reference to `phylink_destroy' drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_disconnect': dpaa2-mac.c:(.text+0xa9f): undefined reference to `phylink_disconnect_phy' dpaa2-mac.c:(.text+0xab0): undefined reference to `phylink_destroy' make: *** [vmlinux] Error 1 Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Chenwandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller12-90/+494
Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-11-06 This series contains updates to ice driver only. Scott adds ethtool -m support so that we can read eeprom data on SFP/OSFP modules. Anirudh updates the return value to properly reflect when SRIOV is not supported. Md Fahad updates the driver to handle a change in the NVM, where the boot configuration section was moved to the Preserved Field Area (PFA) of the NVM. Paul resolves an issue when DCBx requests non-contiguous TCs, transmit hangs could occur, so configure a default traffic class (TC0) in these cases to prevent traffic hangs. Adds a print statement to notify the user when unsupported modules are inserted. Bruce fixes up the driver unload code flow to ensure we do not clear the interrupt scheme until the reset is complete, otherwise a hardware error may occur. Dave updates the DCB initialization to set is_sw_lldp boolean when the firmware has been detected to be in an untenable state. This will ensure that the firmware is in a known state. Michal saves off the PCI state and I/O BARs address after PCI bus reset so that after the reset, device registers can be read. Also adds a NULL pointer check to prevent a potential kernel panic. Mitch resolves an issue where VF's on PF's other than 0 were not seeing resets by using the per-PF VF ID instead of the absolute VF ID. Krzysztof does some code cleanup to remove a unneeded wrapper and reduces the code complexity. Brett reduces confusion by changing the name of ice_vc_dis_vf() to ice_vc_reset_vf() to better describe what the function is actually doing. v2: dropped patch 3 "ice: Add support for FW recovery mode detection" from the origin al series, while Ani makes changes based on community feedback to implement devlink into the changes. v3: dropped patch 1 "ice: implement set_eeprom functionality" due to a bug found and additional changes will be needed when Ani implements devlink in the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: dsa: mv8e6xxx: Fix stub function parametersAndrew Lunn1-1/+2
mv88e6xxx_g2_atu_stats_get() takes two parameters. Make the stub function also take two, otherwise we get compile errors. Fixes: c5f299d59261 ("net: dsa: mv88e6xxx: global1_atu: Add helper for get next") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06Merge branch 'net-phy-at803x-device-tree-binding'David S. Miller5-16/+441
Michael Walle says: ==================== net: phy: at803x device tree binding Adds a device tree binding to configure the clock and the RGMII voltage. Changes since v1: - rebased to latest net-next - renamed "Atheros" to "Qualcomm Atheros" - add a new patch to remove config_init() from AR9331 Changes since the RFC: - renamed the Kconfig entry to "Qualcomm Atheros.." and reordered the item - renamed the prefix from atheros to qca - use the correct name AR803x (instead of AT803x) in new files and dt-bindings. - listed the PHY maintainers in the new schema. Hopefully, thats ok. - fixed a typo in the bindings schema - run dtb_checks and dt_binding_check and fixed the schema - dropped the rgmii-io-1v8 property; instead provide two regulators vddh and vddio, add one consumer vddio-supply - fix the clock settings for the AR8030/AR8035 - only the AR8031 supports chaning the LDO and the PLL mode in software. Check if we have the correct PHY. - new patch to mention the AR8033 which is the same as the AR8031 just without PTP support - new patch which corrects any displayed PHY names and comments. Be consistent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: phy: at803x: remove config_init for AR9331Michael Walle1-2/+0
According to its datasheet, the internal PHY doesn't have debug registers nor MMDs. Since config_init() only configures delays and clocks and so on in these registers it won't be needed on this PHY. Remove it. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: phy: at803x: fix the PHY namesMichael Walle1-9/+9
Fix at least the displayed strings. The actual name of the chip is AR803x. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: phy: at803x: mention AR8033 as same as AR8031Michael Walle2-6/+8
The AR8033 is the AR8031 without PTP support. All other registers are the same. Unfortunately, they share the same PHY ID. Therefore, we cannot distinguish between the one with PTP support and the one without. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: phy: at803x: add device tree bindingMichael Walle1-1/+300
Add support for configuring the CLK_25M pin as well as the RGMII I/O voltage by the device tree. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06dt-bindings: net: phy: Add support for AT803XMichael Walle3-0/+126
Document the Atheros AR803x PHY bindings. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: phy: at803x: fix Kconfig descriptionMichael Walle1-5/+5
The name of the PHY is actually AR803x not AT803x. Additionally, add the name of the vendor and mention the AR8031 support. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06tcp: fix data-race in tcp_recvmsg()Eric Dumazet1-8/+6
Reading tp->recvmsg_inq after socket lock is released raises a KCSAN warning [1] Replace has_tss & has_cmsg by cmsg_flags and make sure to not read tp->recvmsg_inq a second time. [1] BUG: KCSAN: data-race in tcp_chrono_stop / tcp_recvmsg write to 0xffff888126adef24 of 2 bytes by interrupt on cpu 0: tcp_chrono_set net/ipv4/tcp_output.c:2309 [inline] tcp_chrono_stop+0x14c/0x280 net/ipv4/tcp_output.c:2338 tcp_clean_rtx_queue net/ipv4/tcp_input.c:3165 [inline] tcp_ack+0x274f/0x3170 net/ipv4/tcp_input.c:3688 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 tcp_v4_rcv+0x19dc/0x1bb0 net/ipv4/tcp_ipv4.c:1942 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5214 napi_skb_finish net/core/dev.c:5677 [inline] napi_gro_receive+0x28f/0x330 net/core/dev.c:5710 read to 0xffff888126adef25 of 1 bytes by task 7275 on cpu 1: tcp_recvmsg+0x77b/0x1a30 net/ipv4/tcp.c:2187 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1889 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 __vfs_read+0xb1/0xc0 fs/read_write.c:427 vfs_read fs/read_write.c:461 [inline] vfs_read+0x143/0x2c0 fs/read_write.c:446 ksys_read+0xd5/0x1b0 fs/read_write.c:587 __do_sys_read fs/read_write.c:597 [inline] __se_sys_read fs/read_write.c:595 [inline] __x64_sys_read+0x4c/0x60 fs/read_write.c:595 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7275 Comm: sshd Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: b75eba76d3d7 ("tcp: send in-queue bytes in cmsg upon read") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: silence data-races on sk_backlog.tailEric Dumazet4-9/+9
sk->sk_backlog.tail might be read without holding the socket spinlock, we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings. KCSAN reported : BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1: __sk_add_backlog include/net/sock.h:907 [inline] sk_add_backlog include/net/sock.h:938 [inline] tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759 tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043 netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133 napi_skb_finish net/core/dev.c:5596 [inline] napi_gro_receive+0x28f/0x330 net/core/dev.c:5629 receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061 virtnet_receive drivers/net/virtio_net.c:1323 [inline] virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428 napi_poll net/core/dev.c:6311 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6379 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263 ret_from_intr+0x0/0x19 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0: tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1889 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 __vfs_read+0xb1/0xc0 fs/read_write.c:427 vfs_read fs/read_write.c:461 [inline] vfs_read+0x143/0x2c0 fs/read_write.c:446 ksys_read+0xd5/0x1b0 fs/read_write.c:587 __do_sys_read fs/read_write.c:597 [inline] __se_sys_read fs/read_write.c:595 [inline] __x64_sys_read+0x4c/0x60 fs/read_write.c:595 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06dpaa2-eth: fix an always true condition in dpaa2_mac_get_if_modeIoana Ciornei1-5/+10
Convert the phy_mode() function to return the if_mode through an argument, similar to the new form of of_get_phy_mode(). This will help with handling errors in a common manner and also will fix an always true condition. Fixes: 0c65b2b90d13 ("net: of_get_phy_mode: Change API to solve int/unit warnings") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: openvswitch: select vport upcall portid directlyTonghao Zhang1-2/+3
The commit 69c51582ff786 ("dpif-netlink: don't allocate per thread netlink sockets"), in Open vSwitch ovs-vswitchd, has changed the number of allocated sockets to just one per port by moving the socket array from a per handler structure to a per datapath one. In the kernel datapath, a vport will have only one socket in most case, if so select it directly in fast-path. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: axienet: Fix error return code in axienet_probe()Wei Yongjun1-4/+0
In the DMA memory resource get failed case, the error is not set and 0 will be returned. Fix it by removing redundant check since devm_ioremap_resource() will handle it. Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net: aquantia: fix return value check in aq_ptp_init()Wei Yongjun1-1/+1
Function ptp_clock_register() returns ERR_PTR() and never returns NULL. The NULL test should be removed. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06ptp: ptp_clockmatrix: Fix missing unlock on error in idtcm_probe()Wei Yongjun1-1/+3
Add the missing unlock before return from function idtcm_probe() in the error handling case. Fixes: 3a6ba7dc7799 ("ptp: Add a ptp clock driver for IDT ClockMatrix.") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06tipc: eliminate the dummy packet in link synchingTuong Lien1-15/+14
When preparing tunnel packets for the link failover or synchronization, as for the safe algorithm, we added a dummy packet on the pair link but never sent it out. In the case of failover, the pair link will be reset anyway. But for link synching, it will always result in retransmission of the dummy packet after that. We have also observed that such the retransmission at the early stage when a new node comes in a large cluster will take some time and hard to be done, leading to the repeated retransmit failures and the link is reset. Since in commit 4929a932be33 ("tipc: optimize link synching mechanism") we have already built a dummy 'TUNNEL_PROTOCOL' message on the new link for the synchronization, there's no need for the dummy on the pair one, this commit will skip it when the new mechanism takes in place. In case nothing exists in the pair link's transmq, the link synching will just start and stop shortly on the peer side. The patch is backward compatible. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06Merge branch 'lwtunnel-add-ip-and-ip6-options-setting-and-dumping'David S. Miller2-21/+402
Xin Long says: ==================== lwtunnel: add ip and ip6 options setting and dumping With this patchset, users can configure options by ip route encap for geneve, vxlan and ersapn lwtunnel, like: # ip r a 1.1.1.0/24 encap ip id 1 geneve class 0 type 0 \ data "1212121234567890" dst 10.1.0.2 dev geneve1 # ip r a 1.1.1.0/24 encap ip id 1 vxlan gbp 456 \ dst 10.1.0.2 dev erspan1 # ip r a 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \ dst 10.1.0.2 dev erspan1 iproute side patch is attached on the reply of this mail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06lwtunnel: add options setting and dumping for erspanXin Long2-2/+104
Based on the code framework built on the last patch, to support setting and dumping for vxlan, we only need to add ip_tun_parse_opts_erspan() for .build_state and ip_tun_fill_encap_opts_erspan() for .fill_encap and if (tun_flags & TUNNEL_ERSPAN_OPT) for .get_encap_size. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06lwtunnel: add options setting and dumping for vxlanXin Long2-2/+74
Based on the code framework built on the last patch, to support setting and dumping for vxlan, we only need to add ip_tun_parse_opts_vxlan() for .build_state and ip_tun_fill_encap_opts_vxlan() for .fill_encap and if (tun_flags & TUNNEL_VXLAN_OPT) for .get_encap_size. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06lwtunnel: add options setting and dumping for geneveXin Long2-16/+216
To add options setting and dumping, .build_state(), .fill_encap() and .get_encap_size() in ip_tun_lwt_ops needs to be extended: ip_tun_build_state(): ip_tun_parse_opts(): ip_tun_parse_opts_geneve() ip_tun_fill_encap_info(): ip_tun_fill_encap_opts(): ip_tun_fill_encap_opts_geneve() ip_tun_encap_nlsize() ip_tun_opts_nlsize(): if (tun_flags & TUNNEL_GENEVE_OPT) ip_tun_parse_opts(), ip_tun_fill_encap_opts() and ip_tun_opts_nlsize() processes LWTUNNEL_IP_OPTS. ip_tun_parse_opts_geneve(), ip_tun_fill_encap_opts_geneve() and if (tun_flags & TUNNEL_GENEVE_OPT) processes LWTUNNEL_IP_OPTS_GENEVE. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06lwtunnel: add options process for cmp_encapXin Long1-2/+8
When comparing two tun_info, dst_cache member should have been skipped, as dst_cache is a per cpu pointer and they are always different values even in two tun_info with the same keys. So this patch is to skip dst_cache member and compare the key, mode and options_len only. For the future opts setting support, also to compare options. Fixes: 2d79849903e0 ("lwtunnel: ip tunnel: fix multiple routes with different encap") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06lwtunnel: add options process for arp requestXin Long1-3/+4
Without options copied to the dst tun_info in iptunnel_metadata_reply() called by arp_process for handling arp_request, the generated arp_reply packet may be dropped or sent out with wrong options for some tunnels like erspan and vxlan, and the traffic will break. Fixes: 63d008a4e9ee ("ipv4: send arp replies to the correct tunnel") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06tipc: reduce sensitive to retransmit failuresHoang Le1-1/+1
With huge cluster (e.g >200nodes), the amount of that flow: gap -> retransmit packet -> acked will take time in case of STATE_MSG dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance value criteria made link easy failure around 2nd, 3rd of failed retransmission attempts. Instead of re-introduced criteria of 99 faled retransmissions to fix the issue, we increase failure detection timer to ten times tolerance value. Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06tipc: update cluster capabilities if node deletedHoang Le1-1/+11
There are two improvements when re-calculate cluster capabilities: - When deleting a specific down node, need to re-calculate. - In tipc_node_cleanup(), do not need to re-calculate if node is still existing in cluster. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06selftest: net: add some traceroute testsFrancesco Ruggeri2-1/+323
Added the following traceroute tests. IPV6: Verify that in this scenario ------------------------ N2 | | ------ ------ N3 ---- | R1 | | R2 |------|H2| ------ ------ ---- | | ------------------------ N1 | ---- |H1| ---- where H1's default route goes through R1 and R1's default route goes through R2 over N2, traceroute6 from H1 to H2 reports R2's address on N2 and not N1. IPV4: Verify that traceroute from H1 to H2 shows 1.0.1.1 in this scenario 1.0.3.1/24 ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ---- |H1|--------------------------|R1|--------------------------|H2| ---- N1 ---- N2 ---- where net.ipv4.icmp_errors_use_inbound_ifaddr is set on R1 and 1.0.3.1/24 and 1.0.1.1/24 are respectively R1's primary and secondary address on N1. v2: fixed some typos, and have bridge in R1 instead of R2 in IPV6 test. Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06ice: Fix return value when SR-IOV is not supportedAnirudh Venkataramanan1-1/+1
When the device is not capable of supporting SR-IOV -ENODEV is being returned; -EOPNOTSUPP is more appropriate. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: Rename VF function ice_vc_dis_vf to match its behaviorBrett Creeley1-7/+5
ice_vc_dis_vf() tells iavf that it's going to perform a reset and then performs a software reset. This is misleading based on the function name because the VF does not get disabled. So fix this by changing the name to ice_vc_reset_vf(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: Get rid of ice_cleanup_headerKrzysztof Kazimierczak1-25/+2
ice_cleanup_hdrs() has been stripped of most of its content, it only serves as a wrapper for eth_skb_pad(). We can get rid of it altogether and simplify the codebase. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: print PCI link speed and widthPaul Greenwalt1-0/+3
Print message to inform user of PCI link speed and width. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: print unsupported module messagePaul Greenwalt2-0/+10
Print message to inform user if unsupported module is inserted, and extend the topology / configuration detection. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: write register with correct offsetMitch Williams1-1/+1
The VF_MBX_ARQLEN register array is per-PF, not global, so we should not use the absolute VF ID as an index. Instead, use the per-PF VF ID. This fixes an issue with VFs on PFs other than 0 not seeing reset. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06ice: Check for null pointer dereference when setting ringsMichal Swiatkowski1-4/+14
Without this check rebuild vsi can lead to kernel panic. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>