aboutsummaryrefslogtreecommitdiffstats
path: root/include/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-10net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICEEli Cohen9-11/+11
Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with TCA_ID_POLICE and also differentiates these identifier, used in struct tc_action_ops type field, from other macros starting with TCA_ACT_. To make things clearer, we name the enum defining the TCA_ID_* identifiers and also change the "type" field of struct tc_action to id. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08devlink: publish params only after driver init is doneJiri Pirko1-0/+11
Currently, user can do dump or get of param values right after the devlink params are registered. However the driver may not be initialized which is an issue. The same problem happens during notification upon param registration. Allow driver to publish devlink params whenever it is ready to handle get() ops. Note that this cannot be resolved by init reordering, as the "driverinit" params have to be available before the driver is initialized (it needs the param values there). Signed-off-by: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-5/+15
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health report functionalityEran Ben Elisha1-0/+9
Upon error discover, every driver can report it to the devlink health mechanism via devlink_health_report function, using the appropriate reporter registered to it. Driver can pass error specific context which will be delivered to it as part of the dump / recovery callbacks. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and stored at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto Recovery configuration - Grace period vs. Time since last recover Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health reporter create/destroy functionalityEran Ben Elisha1-0/+53
Devlink health reporter is an instance for reporting, diagnosing and recovering from run time errors discovered by the reporters. Define it's data structure and supported operations. In addition, expose devlink API to create and destroy a reporter. Each devlink instance will hold it's own reporters list. As part of the allocation, driver shall provide a set of callbacks which will be used by devlink in order to handle health reports and user commands related to this reporter. In addition, driver is entitled to provide some priv pointer, which can be fetched from the reporter by devlink_health_reporter_priv function. For each reporter, devlink will hold a metadata of statistics, dump msg and status. For passing dumps and diagnose data to the user-space, it will use devlink fmsg API. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add devlink formatted message (fmsg) APIEran Ben Elisha1-0/+149
Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in json-like format. The API allows the driver to add nested attributes such as object, object pair and value array, in addition to attributes such as name and value. Driver can use this API to fill the fmsg context in a format which will be translated by the devlink to the netlink message later. There is no memory allocation in advance (other than the initial list head), and it dynamically allocates messages descriptors and add them to the list on the fly. When it needs to send the data using SKBs to the netlink layer, it fragments the data between different SKBs. In order to do this fragmentation, it uses virtual nests attributes, to avoid actual nesting use which cannot be divided between different SKBs. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_IDFlorian Fainelli1-11/+0
Now that we have a dedicated NDO for getting a port's parent ID, get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the NDO exclusively. This is a preliminary change to getting rid of switchdev_ops eventually. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add wake-up-on-lan and queue to flow_actionPablo Neira Ayuso1-0/+7
These actions need to be added to support the ethtool_rx_flow interface. The queue action includes a field to specify the RSS context, that is set via FLOW_RSS flow type flag and the rss_context field in struct ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS was not set. Also add a field to store the vf index which is stored in the ethtool_rxnfc ring_cookie field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_flower: don't expose TC actions to drivers anymorePablo Neira Ayuso1-1/+0
Now that drivers have been converted to use the flow action infrastructure, remove this field from the tc_cls_flower_offload structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add statistics retrieval infrastructure and use itPablo Neira Ayuso2-0/+15
This patch provides the flow_stats structure that acts as container for tc_cls_flower_offload, then we can use to restore the statistics on the existing TC actions. Hence, tcf_exts_stats_update() is not used from drivers anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_api: add translator to flow_action representationPablo Neira Ayuso1-0/+2
This patch implements a new function to translate from native TC action to the new flow_action representation. Moreover, this patch also updates cls_flower to use this new function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow action infrastructurePablo Neira Ayuso2-1/+69
This new infrastructure defines the nic actions that you can perform from existing network drivers. This infrastructure allows us to avoid a direct dependency with the native software TC action representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow_rule and flow_match structures and use themPablo Neira Ayuso2-3/+123
This patch wraps the dissector key and mask - that flower uses to represent the matching side - around the flow_match structure. To avoid a follow up patch that would edit the same LoCs in the drivers, this patch also wraps this new flow match structure around the flow rule object. This new structure will also contain the flow actions in follow up patches. This introduces two new interfaces: bool flow_rule_match_key(rule, dissector_id) that returns true if a given matching key is set on, and: flow_rule_match_XYZ(rule, &match); To fetch the matching side XYZ into the match container structure, to retrieve the key and the mask with one single call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso1-4/+13
Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-03net: devlink: report cell size of shared buffersJakub Kicinski1-0/+1
Shared buffer allocation is usually done in cell increments. Drivers will either round up the allocation or refuse the configuration if it's not an exact multiple of cell size. Drivers know exactly the cell size of shared buffer, so help out users by providing this information in dumps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWDeepa Dinamani1-0/+1
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ethtool: add compat for devlink infoJakub Kicinski1-0/+10
If driver did not fill the fw_version field, try to call into the new devlink get_info op and collect the versions that way. We assume ethtool was always reporting running versions. v4: - use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot). v3 (Jiri): - do a dump and then parse it instead of special handling; - concatenate all versions (well, all that fit :)). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add generic info version namesJakub Kicinski1-0/+14
Add defines and docs for generic info versions. v3: - add docs; - separate patch (Jiri). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add version reporting to devlink info APIJakub Kicinski1-0/+33
ethtool -i has a few fixed-size fields which can be used to report firmware version and expansion ROM version. Unfortunately, modern hardware has more firmware components. There is usually some datapath microcode, management controller, PXE drivers, and a CPLD load. Running ethtool -i on modern controllers reveals the fact that vendors cram multiple values into firmware version field. Here are some examples from systems I could lay my hands on quickly: tg3: "FFV20.2.17 bc 5720-v1.39" i40e: "6.01 0x800034a4 1.1747.0" nfp: "0.0.3.5 0.25 sriov-2.1.16 nic" Add a new devlink API to allow retrieving multiple versions, and provide user-readable name for those versions. While at it break down the versions into three categories: - fixed - this is the board/fixed component version, usually vendors report information like the board version in the PCI VPD, but it will benefit from naming and common API as well; - running - this is the running firmware version; - stored - this is firmware in the flash, after firmware update this value will reflect the flashed version, while the running version may only be updated after reboot. v3: - add per-type helpers instead of using the special argument (Jiri). RFCv2: - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now versions are mixed with other info attrs)l - have the driver report versions from the same callback as other info. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add device information APIJakub Kicinski1-0/+18
ethtool -i has served us well for a long time, but its showing its limitations more and more. The device information should also be reported per device not per-netdev. Lay foundation for a simple devlink-based way of reading device info. Add driver name and device serial number as initial pieces of information exposed via this new API. v3: - rename helpers (Jiri); - rename driver name attr (Jiri); - remove double spacing in commit message (Jiri). RFC v2: - wrap the skb into an opaque structure (Jiri); - allow the serial number of be any length (Jiri & Andrew); - add driver name (Jonathan). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESSDave Watson1-0/+1
Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tls 1.3 supportDave Watson1-17/+49
TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor tls aad space size calculationDave Watson1-0/+1
TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Support 256 bit keysDave Watson1-1/+4
Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01mac80211: fix missing/malformed documentationJohannes Berg1-8/+36
Fix the missing and malformed documentation that kernel-doc and sphinx warn about. While at it, also add some things to the docs to fix missing links. Sadly, the only way I could find to fix this was to add some trailing whitespace. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01cfg80211: add missing documentation that kernel-doc warns aboutJohannes Berg1-1/+14
Add the missing documentation that kernel-doc continually warns about, to get rid of all that noise. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01netlink: reduce NLA_POLICY_NESTED{,_ARRAY} argumentsJohannes Berg1-2/+6
In typical cases, there's no need to pass both the maxattr and the policy array pointer, as the maxattr should just be ARRAY_SIZE(policy) - 1. Therefore, to be less error prone, just remove the maxattr argument from the default macros and deduce the size accordingly. Leave the original macros with a leading underscore to use here and in case somebody needs to pass a policy pointer where the policy isn't declared in the same place and thus ARRAY_SIZE() cannot be used. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01Merge remote-tracking branch 'net-next/master' into mac80211-nextJohannes Berg28-255/+323
Merge net-next so that we get the changes from net, which would otherwise conflict with the NLA_POLICY_NESTED/_ARRAY changes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01cfg80211: fix typoMatteo Croce1-1/+1
Fix spelling mistake in cfg80211.h: "lenght" -> "length". The typo is also in the special comment block which translates to documentation. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01mac80211: Fix documentation strings for airtime-related variablesToke Høiland-Jørgensen1-1/+1
There was a typo in the documentation for weight_multiplier in mac80211.h, and the doc was missing entirely for airtime and airtime_weight in sta_info.h. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-30ipvlan, l3mdev: fix broken l3s mode wrt local routesDaniel Borkmann1-1/+2
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin, I ran into the issue that while l3 mode is working fine, l3s mode does not have any connectivity to kube-apiserver and hence all pods end up in Error state as well. The ipvlan master device sits on top of a bond device and hostns traffic to kube-apiserver (also running in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573 where the latter is the address of the bond0. While in l3 mode, a curl to https://10.152.183.1:443 or to https://139.178.29.207:37573 works fine from hostns, neither of them do in case of l3s. In the latter only a curl to https://127.0.0.1:37573 appeared to work where for local addresses of bond0 I saw kernel suddenly starting to emit ARP requests to query HW address of bond0 which remained unanswered and neighbor entries in INCOMPLETE state. These ARP requests only happen while in l3s. Debugging this further, I found the issue is that l3s mode is piggy- backing on l3 master device, and in this case local routes are using l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback"). I found that reverting them back into using the net->loopback_dev fixed ipvlan l3s connectivity and got everything working for the CNI. Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the l3mdev paper in [0] the only sole reason why ipvlan l3s is relying on l3 master device is to get the l3mdev_ip_rcv() receive hook for setting the dst entry of the input route without adding its own ipvlan specific hacks into the receive path, however, any l3 domain semantics beyond just that are breaking l3s operation. Note that ipvlan also has the ability to dynamically switch its internal operation from l3 to l3s for all ports via ipvlan_set_port_mode() at runtime. In any case, l3 vs l3s soley distinguishes itself by 'de-confusing' netfilter through switching skb->dev to ipvlan slave device late in NF_INET_LOCAL_IN before handing the skb to L4. Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which, if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook without any additional l3mdev semantics on top. This should also have minimal impact since dev->priv_flags is already hot in cache. With this set, l3s mode is working fine and I also get things like masquerading pod traffic on the ipvlan master properly working. [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback") Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Mahesh Bandewar <maheshb@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Florian Westphal <fw@strlen.de> Cc: Martynas Pumputis <m@lambda.lt> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockoptXin Long1-0/+2
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this patch. It also adds default_ss in sctp_sock to support SCTP_FUTURE_ASSOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockoptXin Long1-0/+2
Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_paddr_thresholds, it's compatible with 0. It also adds pf_retrans in sctp_sock to support SCTP_FUTURE_ASSOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29devlink: Add a generic wake_on_lan port parameterVasundhara Volam1-0/+8
wake_on_lan - Enables Wake on Lan for this port. If enabled, the controller asserts a wake pin based on the WOL type. v2->v3: - Define only WOL types used now and define them as bitfield, so that mutliple WOL types can be enabled upon power on. - Modify "wake-on-lan" name to "wake_on_lan" to be symmetric with previous definitions. - Rename DEVLINK_PARAM_WOL_XXX to DEVLINK_PARAM_WAKE_XXX to be symmetrical with ethtool WOL definitions. Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29devlink: Add devlink notifications support for port paramsVasundhara Volam1-0/+8
Add notification call for devlink port param set, register and unregister functions. Add devlink_port_param_value_changed() function to enable the driver notify devlink on value change. Driver should use this function after value was changed on any configuration mode part to driverinit. v7->v8: Order devlink_port_param_value_changed() definitions followed by devlink_param_value_changed() Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29devlink: Add support for driverinit set value for devlink_portVasundhara Volam1-0/+11
Add support for "driverinit" configuration mode value for devlink_port configuration parameters. Add devlink_port_param_driverinit_value_set() function to help the driver set the value to devlink_port. Also, move the common code to __devlink_param_driverinit_value_set() to be used by both device and port params. v7->v8: Re-order the definitions as follows: __devlink_param_driverinit_value_get __devlink_param_driverinit_value_set devlink_param_driverinit_value_get devlink_param_driverinit_value_set devlink_port_param_driverinit_value_get devlink_port_param_driverinit_value_set Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29devlink: Add support for driverinit get value for devlink_portVasundhara Volam1-0/+12
Add support for "driverinit" configuration mode value for devlink_port configuration parameters. Add devlink_port_param_driverinit_value_get() function to help the driver get the value from devlink_port. Also, move the common code to __devlink_param_driverinit_value_get() to be used by both device and port params. v7->v8: -Add the missing devlink_port_param_driverinit_value_get() declaration. -Also, order devlink_port_param_driverinit_value_get() after devlink_param_driverinit_value_get/set() calls Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29devlink: Add devlink_param for port register and unregisterVasundhara Volam1-0/+22
Add functions to register and unregister for the driver supported configuration parameters table per port. v7->v8: - Order the definitions following way as suggested by Jiri. __devlink_params_register __devlink_params_unregister devlink_params_register devlink_params_unregister devlink_port_params_register devlink_port_params_unregister - Append with Acked-by: Jiri Pirko <jiri@mellanox.com>. v2->v3: - Add a helper __devlink_params_register() with common code used by both devlink_params_register() and devlink_port_params_register(). Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
2019-01-28net: tls: Save iv in tls_rec for async crypto requestsDave Watson1-0/+2
aead_request_set_crypt takes an iv pointer, and we change the iv soon after setting it. Some async crypto algorithms don't save the iv, so we need to save it in the tls_rec for async requests. Found by hardcoding x64 aesni to use async crypto manager (to test the async codepath), however I don't think this combination can happen in the wild. Presumably other hardware offloads will need this fix, but there have been no user reports. Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller3-0/+18
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-01-29 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Teach verifier dead code removal, this also allows for optimizing / removing conditional branches around dead code and to shrink the resulting image. Code store constrained architectures like nfp would have hard time doing this at JIT level, from Jakub. 2) Add JMP32 instructions to BPF ISA in order to allow for optimizing code generation for 32-bit sub-registers. Evaluation shows that this can result in code reduction of ~5-20% compared to 64 bit-only code generation. Also add implementation for most JITs, from Jiong. 3) Add support for __int128 types in BTF which is also needed for vmlinux's BTF conversion to work, from Yonghong. 4) Add a new command to bpftool in order to dump a list of BPF-related parameters from the system or for a specific network device e.g. in terms of available prog/map types or helper functions, from Quentin. 5) Add AF_XDP sock_diag interface for querying sockets from user space which provides information about the RX/TX/fill/completion rings, umem, memory usage etc, from Björn. 6) Add skb context access for skb_shared_info->gso_segs field, from Eric. 7) Add support for testing flow dissector BPF programs by extending existing BPF_PROG_TEST_RUN infrastructure, from Stanislav. 8) Split BPF kselftest's test_verifier into various subgroups of tests in order better deal with merge conflicts in this area, from Jakub. 9) Add support for queue/stack manipulations in bpftool, from Stanislav. 10) Document BTF, from Yonghong. 11) Dump supported ELF section names in libbpf on program load failure, from Taeung. 12) Silence a false positive compiler warning in verifier's BTF handling, from Peter. 13) Fix help string in bpftool's feature probing, from Prashant. 14) Remove duplicate includes in BPF kselftests, from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller10-83/+129
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Introduce a hashtable to speed up object lookups, from Florian Westphal. 2) Make direct calls to built-in extension, also from Florian. 3) Call helper before confirming the conntrack as it used to be originally, from Florian. 4) Call request_module() to autoload br_netfilter when physdev is used to relax the dependency, also from Florian. 5) Allow to insert rules at a given position ID that is internal to the batch, from Phil Sutter. 6) Several patches to replace conntrack indirections by direct calls, and to reduce modularization, from Florian. This also includes several follow up patches to deal with minor fallout from this rework. 7) Use RCU from conntrack gre helper, from Florian. 8) GRE conntrack module becomes built-in into nf_conntrack, from Florian. 9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(), from Florian. 10) Unify sysctl handling at the core of nf_conntrack, from Florian. 11) Provide modparam to register conntrack hooks. 12) Allow to match on the interface kind string, from wenxu. 13) Remove several exported symbols, not required anymore now after a bit of de-modulatization work has been done, from Florian. 14) Remove built-in map support in the hash extension, this can be done with the existing userspace infrastructure, from laura. 15) Remove indirection to calculate checksums in IPVS, from Matteo Croce. 16) Use call wrappers for indirection in IPVS, also from Matteo. 17) Remove superfluous __percpu parameter in nft_counter, patch from Luc Van Oostenryck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28ipvs: avoid indirect calls when calculating checksumsMatteo Croce1-3/+0
The function pointer ip_vs_protocol->csum_check is only used in protocol specific code, and never in the generic one. Remove the function pointer from struct ip_vs_protocol and call the checksum functions directly. This reduces the performance impact of the Spectre mitigation, and should give a small improvement even with RETPOLINES disabled. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-27tcp: change pingpong threshold to 3Wei Wang1-1/+9
In order to be more confident about an on-going interactive session, we increment pingpong count by 1 for every interactive transaction and we adjust TCP_PINGPONG_THRESH to 3. This means, we only consider a session in pingpong mode after we see 3 interactive transactions, and start to activate delayed acks in quick ack mode. And in order to not over-count the credits, we only increase pingpong count for the first packet sent in response for the previous received packet. This is mainly to prevent delaying the ack immediately after some handshake protocol but no real interactive traffic pattern afterwards. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27tcp: Refactor pingpong codeWei Wang1-0/+17
Instead of using pingpong as a single bit information, we refactor the code to treat it as a counter. When interactive session is detected, we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode. This patch is a pure refactor and sets foundation for the next patch. This patch itself does not change any pingpong logic. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+12
2019-01-26ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmitwenxu1-1/+1
Add tnl_update_pmtu in ip_md_tunnel_xmit to dynamic modify the pmtu which packet send through collect_metadata mode ip tunnel Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25tcp: allow zerocopy with fastopenWillem de Bruijn1-0/+1
Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove the explicit check for ESTABLISHED and CLOSE_WAIT states. This requires correctly handling zerocopy state (uarg, sk_zckey) in all paths reachable from other TCP states. Such as the EPIPE case in sk_stream_wait_connect, which a sendmsg() in incorrect state will now hit. Most paths are already safe. Only extension needed is for TCP Fastopen active open. This can build an skb with data in tcp_send_syn_data. Pass the uarg along with other fastopen state, so that this skb also generates a zerocopy notification on release. Tested with active and passive tcp fastopen packetdrill scripts at https://github.com/wdebruij/packetdrill/commit/1747eef03d25a2404e8132817d0f1244fd6f129d Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: IP6 defrag: use rbtrees for IPv6 defragPeter Oskolkov1-2/+9
Currently, IPv6 defragmentation code drops non-last fragments that are smaller than 1280 bytes: see commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu") This behavior is not specified in IPv6 RFCs and appears to break compatibility with some IPv6 implemenations, as reported here: https://www.spinics.net/lists/netdev/msg543846.html This patch re-uses common IP defragmentation queueing and reassembly code in IPv6, removing the 1280 byte restriction. Signed-off-by: Peter Oskolkov <posk@google.com> Reported-by: Tom Herbert <tom@herbertland.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: IP defrag: encapsulate rbtree defrag code into callable functionsPeter Oskolkov1-2/+14
This is a refactoring patch: without changing runtime behavior, it moves rbtree-related code from IPv4-specific files/functions into .h/.c defrag files shared with IPv6 defragmentation code. Signed-off-by: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>