aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-25net: IP6 defrag: use rbtrees for IPv6 defragPeter Oskolkov1-2/+9
Currently, IPv6 defragmentation code drops non-last fragments that are smaller than 1280 bytes: see commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu") This behavior is not specified in IPv6 RFCs and appears to break compatibility with some IPv6 implemenations, as reported here: https://www.spinics.net/lists/netdev/msg543846.html This patch re-uses common IP defragmentation queueing and reassembly code in IPv6, removing the 1280 byte restriction. Signed-off-by: Peter Oskolkov <posk@google.com> Reported-by: Tom Herbert <tom@herbertland.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: IP defrag: encapsulate rbtree defrag code into callable functionsPeter Oskolkov1-2/+14
This is a refactoring patch: without changing runtime behavior, it moves rbtree-related code from IPv4-specific files/functions into .h/.c defrag files shared with IPv6 defragmentation code. Signed-off-by: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: Revert devlink health changes.David S. Miller3-231/+0
This reverts the devlink health changes from 9/17/2019, Jiri wants things to be designed differently and it was agreed that the easiest way to do this is start from the beginning again. Commits reverted: cb5ccfbe73b389470e1dc11061bb185ef4bc9aec 880ee82f0313453ec5a6cb122866ac057263066b c7af343b4e33578b7de91786a3f639c8cfa0d97b ff253fedab961b22117a73ab808fcfa9e6852b50 6f9d56132eb6d2603d4273cfc65bed914ec47acb fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7 8a66704a13d9713593342e29b4f0c19762f5746b 12bd0dcefe88782ac1c9fff632958dd1b71d27e5 aba25279c10094c5c97d09c3491ca86d00b4ad5e ce019faa70f81555fa17ebc1d5a03651f2e7e15a b8c45a033acc607201588f7665ba84207e5149e0 And the follow-on build fix: o33a0efa4baecd689da9474ce0e8b673eb6931c60 Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24tcp_bbr: adapt cwnd based on ack aggregation estimationPriyaranjan Jha1-2/+2
Aggregation effects are extremely common with wifi, cellular, and cable modem link technologies, ACK decimation in middleboxes, and LRO and GRO in receiving hosts. The aggregation can happen in either direction, data or ACKs, but in either case the aggregation effect is visible to the sender in the ACK stream. Previously BBR's sending was often limited by cwnd under severe ACK aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets were acked in bursts after long delays (e.g. one ACK acking 5*BDP after 5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving the bottleneck idle for potentially long periods. Note that loss-based congestion control does not have this issue because when facing aggregation it continues increasing cwnd after bursts of ACKs, growing cwnd until the buffer is full. To achieve good throughput in the presence of aggregation effects, this algorithm allows the BBR sender to put extra data in flight to keep the bottleneck utilized during silences in the ACK stream that it has evidence to suggest were caused by aggregation. A summary of the algorithm: when a burst of packets are acked by a stretched ACK or a burst of ACKs or both, BBR first estimates the expected amount of data that should have been acked, based on its estimated bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max filter to estimate the recent level of observed ACK aggregation. Then cwnd is increased by the ACK aggregation estimate. The larger cwnd avoids BBR being cwnd-limited in the face of ACK silences that recent history suggests were caused by aggregation. As a sanity check, the ACK aggregation degree is upper-bounded by the cwnd (at the time of measurement) and a global max of BW * 100ms. The algorithm is further described by the following presentation: https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00 In our internal testing, we observed a significant increase in BBR throughput (measured using netperf), in a basic wifi setup. - Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi) - 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps - 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps Also, this code is running globally on YouTube TCP connections and produced significant bandwidth increases for YouTube traffic. This is based on Ian Swett's max_ack_height_ algorithm from the QUIC BBR implementation. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24bonding: count master 3ad stats separatelyNikolay Aleksandrov1-1/+1
I made a dumb mistake when I summed up the slave stats, obviously slaves can come and go which would make the master stats unreliable. Count and export the master stats separately. Fixes: a258aeacd7f0 ("bonding: add support for xstats and export 3ad stats") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24net: phy: change phy_start_interrupts to phy_request_interruptHeiner Kallweit1-1/+1
Now that we enable the interrupts in phy_start() we don't have to do it before. Therefore remove enabling interrupts from phy_start_interrupts() and rename this function to reflect the changed functionality. v2: - improve warning to clearly state that we fall back to polling Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22ptp: add debugfs support for ptp_qoriqYangbo Lu1-0/+12
This patch is to add debugfs support for ptp_qoriq. Current debugfs supports to control fiper1/fiper2 loopback mode. If the loopback mode is enabled, the fiper1/fiper2 pulse is looped back into trigger1/ trigger2 input. This is very useful for validating hardware and driver without external hardware. Below is an example to enable fiper1 loopback. echo 1 > /sys/kernel/debug/2d10e00.ptp_clock/fiper1-loopback Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22ptp_qoriq: support external trigger stamp FIFOYangbo Lu1-0/+3
The external trigger stamp FIFO was introduced as a new feature for QorIQ 1588 timer IP block. This patch is to support it by adding a new dts property "fsl,extts-fifo". Any QorIQ 1588 timer supporting this feature is required to add this property in its dts node. In addition, the FIFO should be cleaned up before enabling external trigger interrupts. Otherwise, there will be interrupts immediately just after enabling external trigger interrupts. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22bridge: Snoop Multicast Router AdvertisementsLinus Lüssing4-0/+23
When multiple multicast routers are present in a broadcast domain then only one of them will be detectable via IGMP/MLD query snooping. The multicast router with the lowest IP address will become the selected and active querier while all other multicast routers will then refrain from sending queries. To detect such rather silent multicast routers, too, RFC4286 ("Multicast Router Discovery") provides a standardized protocol to detect multicast routers for multicast snooping switches. This patch implements the necessary MRD Advertisement message parsing and after successful processing adds such routers to the internal multicast router list. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22bridge: join all-snoopers multicast addressLinus Lüssing1-4/+5
Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends to snoop multicast router advertisements to detect multicast routers. Multicast router advertisements are sent to an "all-snoopers" multicast address. To be able to receive them reliably, we need to join this group. Otherwise other snooping switches might refrain from forwarding these advertisements to us. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() callsLinus Lüssing4-2/+32
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and their callers (more precisely, the Linux bridge) to not rely on the skb_trimmed parameter anymore. An skb with its tail trimmed to the IP packet length was initially introduced for the following three reasons: 1) To be able to verify the ICMPv6 checksum. 2) To be able to distinguish the version of an IGMP or MLD query. They are distinguishable only by their size. 3) To avoid parsing data for an IGMPv3 or MLDv2 report that is beyond the IP packet but still within the skb. The first case still uses a cloned and potentially trimmed skb to verfiy. However, there is no need to propagate it to the caller. For the second and third case explicit IP packet length checks were added. This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier to read and verfiy, as well as easier to use. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22bonding: add support for xstats and export 3ad statsNikolay Aleksandrov3-0/+28
This patch adds support for extended statistics (xstats) call to the bonding. The first user would be the 3ad code which counts the following events: - LACPDU Rx/Tx - LACPDU unknown type Rx - LACPDU illegal Rx - Marker Rx/Tx - Marker response Rx/Tx - Marker unknown type Rx All of these are exported via netlink as separate attributes to be easily extensible as we plan to add more in the future. Similar to how the bridge and other xstats exports, the structure inside is: [ IFLA_STATS_LINK_XSTATS ] -> [ LINK_XSTATS_TYPE_BOND ] -> [ BOND_XSTATS_3AD ] -> [ 3ad stats attributes ] With this structure it's easy to add more stat types later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22bonding: add 3ad statsNikolay Aleksandrov1-0/+14
Count the following types of 3ad packets per slave: - rx/tx lacpdu - rx/tx marker - rx/tx marker response - rx illegal lacpdus (right now counted on wrong length) - rx unknown lacpdu type - rx unknown marker type The counters are using atomic64 since this is not fast path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22net: introduce a knob to control whether to inherit devconf configCong Wang1-0/+1
There have been many people complaining about the inconsistent behaviors of IPv4 and IPv6 devconf when creating new network namespaces. Currently, for IPv4, we inherit all current settings from init_net, but for IPv6 we reset all setting to default. This patch introduces a new /proc file /proc/sys/net/core/devconf_inherit_init_net to control the behavior of whether to inhert sysctl current settings from init_net. This file itself is only available in init_net. As demonstrated below: Initial setup in init_net: # cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Default value 0 (current behavior): # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to 1 (inherit from init_net): # echo 1 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Set to 2 (reset to default): # echo 2 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 0 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to a value out of range (invalid): # echo 3 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument # echo -1 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller18-39/+29
Completely minor snmp doc conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-21/+7
Pull networking fixes from David Miller: 1) Fix endless loop in nf_tables, from Phil Sutter. 2) Fix cross namespace ip6_gre tunnel hash list corruption, from Olivier Matz. 3) Don't be too strict in phy_start_aneg() otherwise we might not allow restarting auto negotiation. From Heiner Kallweit. 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue. 5) Memory leak in act_tunnel_key, from Davide Caratti. 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn. 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet. 8) Missing udplite rehash callbacks, from Alexey Kodanev. 9) Log dirty pages properly in vhost, from Jason Wang. 10) Use consume_skb() in neigh_probe() as this is a normal free not a drop, from Yang Wei. Likewise in macvlan_process_broadcast(). 11) Missing device_del() in mdiobus_register() error paths, from Thomas Petazzoni. 12) Fix checksum handling of short packets in mlx5, from Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits) bpf: in __bpf_redirect_no_mac pull mac only if present virtio_net: bulk free tx skbs net: phy: phy driver features are mandatory isdn: avm: Fix string plus integer warning from Clang net/mlx5e: Fix cb_ident duplicate in indirect block register net/mlx5e: Fix wrong (zero) TX drop counter indication for representor net/mlx5e: Fix wrong error code return on FEC query failure net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames tools: bpftool: Cleanup license mess bpf: fix inner map masking to prevent oob under speculation bpf: pull in pkt_sched.h header for tooling to fix bpftool build selftests: forwarding: Add a test case for externally learned FDB entries selftests: mlxsw: Test FDB offload indication mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky net: bridge: Mark FDB entries that were added by user as such mlxsw: spectrum_fid: Update dummy FID index mlxsw: pci: Return error on PCI reset timeout mlxsw: pci: Increase PCI SW reset timeout mlxsw: pci: Ring CQ's doorbell before RDQ's MAINTAINERS: update email addresses of liquidio driver maintainers ...
2019-01-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-4/+9
Pull virtio/vhost fixes and cleanups from Michael Tsirkin: "Fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: Use copy_to_iter() to send control queue response vhost: return EINVAL if iovecs size does not match the message size virtio-balloon: tweak config_changed implementation virtio: don't allocate vqs when names[i] = NULL virtio_pci: use queue idx instead of array idx to set up the vq virtio: document virtio_config_ops restrictions virtio: fix virtio_config_ops description
2019-01-21Merge tags 'compiler-attributes-for-linus-v5.0-rc3' and 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linuxLinus Torvalds4-11/+6
Pull misc clang fixes from Miguel Ojeda: - A fix for OPTIMIZER_HIDE_VAR from Michael S Tsirkin - Update clang-format with the latest for_each macro list from Jason Gunthorpe * tag 'compiler-attributes-for-linus-v5.0-rc3' of git://github.com/ojeda/linux: include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR * tag 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux: clang-format: Update .clang-format with the latest for_each macro list
2019-01-19net_sched: add performance counters for basic filterCong Wang1-0/+7
Similar to u32 filter, it is useful to know how many times we reach each basic filter and how many times we pass the ematch attached to it. Sample output: filter protocol arp pref 49152 basic chain 0 filter protocol arp pref 49152 basic chain 0 handle 0x1 (rule hit 3 success 3) action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 installed 81 sec used 4 sec Action statistics: Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-20Merge tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds1-0/+1
Pull MIPS fixes from Paul Burton: - Fix IPI handling for Lantiq SoCs, which was broken by changes made back in v4.12. - Enable OF/DT serial support in ath79_defconfig to give us working serial by default. - Fix 64b builds for the Jazz platform. - Set up a struct device for the BCM47xx SoC to allow BCM47xx drivers to perform DMA again following the major DMA mapping changes made in v4.19. - Disable MSI on Cavium Octeon systems when the pcie_disable command line parameter introduced in v3.3 is used, in order to avoid inadvetently accessing PCIe controller registers despite the command line. - Fix a build failure for Cavium Octeon kernels with kexec enabled, introduced in v4.20. - Fix a regression in the behaviour of semctl/shmctl/msgctl IPC syscalls for kernels including n32 support but not o32 support caused by some cleanup in v3.19. * tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: OCTEON: fix kexec support mips: fix n32 compat_ipc_parse_version Disable MSI also when pcie-octeon.pcie_disable on MIPS: BCM47XX: Setup struct device for the SoC MIPS: jazz: fix 64bit build MIPS: ath79: Enable OF serial ports in the default config MIPS: lantiq: Use CP0_LEGACY_COMPARE_IRQ MIPS: lantiq: Fix IPI interrupt handling
2019-01-20Merge tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-0/+1
Pull libnvdimm fixes from Dan Williams: "A crash fix, a build warning fix, a miscellaneous small cleanups. In case anyone is looking for them, there was a regression caught by testing that caused two patches to be dropped from this update. Those patches have been reworked and will soak for another week / re-target 5.0-rc4. - Fix driver initialization crash due to the inability to report an 'error' state for a DIMM's security capability. - Build warning fix for little-endian ARM64 builds - Fix a potential race between the EDAC driver's usage of the NFIT SMBIOS id for a DIMM and the driver shutdown path. - A small collection of one-line benign cleanups for duplicate variable assignments, a duplicate header include and a mis-typed function argument" * tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/security: Fix nvdimm_security_state() state request selection acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set() acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id() libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMs nfit: Mark some functions as __maybe_unused ACPI/nfit: delete the function to_acpi_nfit_desc ACPI/nfit: delete the redundant header file
2019-01-19net: netlink: add helper to retrieve NETLINK_F_STRICT_CHKJakub Kicinski1-0/+1
Dumps can read state of the NETLINK_F_STRICT_CHK flag from a field in the callback structure. For non-dump GET requests we need a way to access the state of that flag from a socket. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: phy: phy driver features are mandatoryCamelia Groza1-2/+2
Since phy driver features became a link_mode bitmap, phy drivers that don't have a list of features configured will cause the kernel to crash when probed. Prevent the phy driver from registering if the features field is missing. Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap") Reported-by: Scott Wood <oss@buserror.net> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19sch_api: Change signature of qdisc_tree_reduce_backlog() to use intsToke Høiland-Jørgensen1-2/+1
There are now several places where qdisc_tree_reduce_backlog() is called with a negative number of packets (to signal an increase in number of packets in the queue). Rather than rely on overflow behaviour, change the function signature to use signed integers to communicate this usage to people reading the code. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health dump {get,clear} commandsEran Ben Elisha1-0/+2
Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health diagnose commandEran Ben Elisha1-0/+1
Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health recover commandEran Ben Elisha1-0/+1
Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health set commandEran Ben Elisha1-0/+1
Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health get commandEran Ben Elisha1-0/+12
Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health report functionalityEran Ben Elisha2-0/+71
Upon error discover, every driver can report it to the devlink health mechanism via devlink_health_report function, using the appropriate reporter registered to it. Driver can pass error specific context which will be delivered to it as part of the dump / recovery callbacks. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and stored at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. depends on: - Auto Recovery configuration - Grace period vs. time since last recover Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health reporter create/destroy functionalityEran Ben Elisha1-0/+59
Devlink health reporter is an instance for reporting, diagnosing and recovering from run time errors discovered by the reporters. Define it's data structure and supported operations. In addition, expose devlink API to create and destroy a reporter. Each devlink instance will hold it's own reporters list. As part of the allocation, driver shall provide a set of callbacks which will be used the devlink in order to handle health reports and user commands related to this reporter. In addition, driver is entitled to provide some priv pointer, which can be fetched from the reporter by devlink_health_reporter_priv function. For each reporter, devlink will hold a metadata of statistics, buffers and status. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health buffer supportEran Ben Elisha2-0/+84
Devlink health buffer is a mechanism to pass descriptors between drivers and devlink. The API allows the driver to add objects, object pair, value array (nested attributes), value and name. Driver can use this API to fill the buffers in a format which can be translated by the devlink to the netlink message. In order to fulfill it, an internal buffer descriptor is defined. This will hold the data and metadata per each attribute and by used to pass actual commands to the netlink. This mechanism will be later used in devlink health for dump and diagnose data store by the drivers. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net_sched: add hit counter for matchallCong Wang1-0/+6
Although matchall always matches packets, however, it still relies on a protocol match first. So it is still useful to have such a counter for matchall. Of course, unlike u32, every time we hit a matchall filter, it is always a success, so we don't have to distinguish them. Sample output: filter protocol 802.1Q pref 100 matchall chain 0 filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1 not_in_hw (rule hit 10) action order 1: vlan pop continue index 1 ref 1 bind 1 installed 40 sec used 1 sec Action statistics: Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net: phy: remove phy_stop_interruptsHeiner Kallweit1-1/+0
Interrupts have been disabled in phy_stop() already. So we can remove phy_stop_interrupts() and free the interrupt in phy_disconnect() directly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net: Fix usage of pskb_trim_rcsumRoss Lagerwall1-0/+1
In certain cases, pskb_trim_rcsum() may change skb pointers. Reinitialize header pointers afterwards to avoid potential use-after-frees. Add a note in the documentation of pskb_trim_rcsum(). Found by KASAN. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18tcp: declare tcp_mmap() only when CONFIG_MMU is setYafang Shao1-0/+2
Since tcp_mmap() is defined when CONFIG_MMU is set. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net: phy: remove state PHY_CHANGELINKHeiner Kallweit1-6/+0
Since recent changes to the phylib state machine state PHY_CHANGELINK isn't used any longer. Therefore let's remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19Merge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linuxLinus Torvalds1-0/+1
Pull fbdev fixes from Bartlomiej Zolnierkiewicz: - fix stack memory leak in omap2fb driver (Vlad Tsyrklevich) - fix OF node name handling v4.20 regression in offb driver (Rob Herring) - convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into a kernel parameter (Peter Rosin) * tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux: fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option fbdev: offb: Fix OF node name handling omap2fb: Fix stack memory disclosure
2019-01-17qed: remove duplicated include from qed_if.hYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-0/+1
Pull rdma fixes frfom Jason Gunthorpe: "Not much so far. We have the usual batch of bugs and two fixes to code merged this cycle: - Restore valgrind support for the ioctl verbs interface merged this window, and fix a missed error code on an error path from that conversion - A user reported crash on obsolete mthca hardware - pvrdma was using the wrong command opcode toward the hypervisor - NULL pointer crash regression when dumping rdma-cm over netlink - Be conservative about exposing the global rkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT RDMA/mthca: Clear QP objects during their allocation RDMA/vmw_pvrdma: Return the correct opcode when creating WR RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type RDMA/nldev: Don't expose unsafe global rkey to regular user RDMA/uverbs: Fix post send success return value in case of error
2019-01-17switchdev: Add extack argument to call_switchdev_notifiers()Petr Machata1-2/+4
A follow-up patch will enable vetoing of FDB entries. Make it possible to communicate details of why an FDB entry is not acceptable back to the user. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17vxlan: Add extack to switchdev operationsPetr Machata1-2/+4
There are four sources of VXLAN switchdev notifier calls: - the changelink() link operation, which already supports extack, - ndo_fdb_add() which got extack support in a previous patch, - FDB updates due to packet forwarding, - and vxlan_fdb_replay(). Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the switchdev message that it sends, and propagate the argument upwards to the callers. For the first two cases, pass in the extack gotten through the operation. For case #3, pass in NULL. To cover the last case, extend vxlan_fdb_replay() to take extack argument, which might come from whatever operation necessitated the FDB replay. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: Add extack argument to ndo_fdb_add()Petr Machata1-2/+4
Drivers may not be able to support certain FDB entries, and an error code is insufficient to give clear hints as to the reasons of rejection. In order to make it possible to communicate the rejection reason, extend ndo_fdb_add() with an extack argument. Adapt the existing implementations of ndo_fdb_add() to take the parameter (and ignore it). Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: introduce SO_BINDTOIFINDEX sockoptDavid Herrmann1-0/+2
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17tls: Fix recvmsg() to be able to peek across multiple recordsVakul Garg1-1/+2
This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: phy: Add helpers to determine if PHY driver is genericFlorian Fainelli1-0/+3
We are already checking in phy_detach() that the PHY driver is of generic kind (1G or 10G) and we are going to make use of that in the SFP layer as well for 1000BaseT SFP modules, so expose helper functions to return that information. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: dsa: Include platform_data header fileFlorian Fainelli2-2/+2
b53 and mv88e6xxx support passing platform_data, and now that we have split the platform_data portion from the main net/dsa.h header file, include only the relevant parts. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: dsa: Split platform data to header fileFlorian Fainelli2-60/+69
Instead of having net/dsa.h contain both the internal switch tree/driver structures, split the relevant platform_data parts into include/linux/platform_data/dsa.h and make that header be included by net/dsa.h in order not to break any setup. A subsequent set of patches will update code including net/dsa.h to include only the platform_data header. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18Merge tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds1-0/+2
Pull AFS fixes from David Howells: "Here's a set of fixes for AFS: - Use struct_size() for kzalloc() size calculation. - When calling YFS.CreateFile rather than AFS.CreateFile, it is possible to create a file with a file lock already held. The default value indicating no lock required is actually -1, not 0. - Fix an oops in inode/vnode validation if the target inode doesn't have a server interest assigned (ie. a server that will notify us of changes by third parties). - Fix refcounting of keys in file locking. - Fix a race in refcounting asynchronous operations in the event of an error during request transmission. The provision of a dedicated function to get an extra ref on a call is split into a separate commit" * tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix race in async call refcounting afs: Provide a function to get a ref on a call afs: Fix key refcounting in file locking code afs: Don't set vnode->cb_s_break in afs_validate() afs: Set correct lock type for the yfs CreateFile afs: Use struct_size() in kzalloc()
2019-01-18Merge tag 'devicetree-fixes-for-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds2-3/+1
Pull Devicetree fixes from Rob Herring: - Remove now unused struct device_node.type pointer - Fix meson-axg reset header SPDX tag - Add missing of_node_put in of_graph_get_remote_port_parent - Fix several binding doc file references and typos * tag 'devicetree-fixes-for-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: reset: meson-axg: fix SPDX license id dt-bindings: soc: qcom: Fix trivial language typos doc: gpio-mvebu: fix broken reference to cp110-system-controller0.txt file OF: properties: add missing of_node_put doc: bindings: fix bad reference to ARM CPU bindings dt-bindings: marvell,mmp2: fix typos in bindings doc of: Remove struct device_node.type pointer