aboutsummaryrefslogtreecommitdiffstats
path: root/include/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-26tcp: convert tcp_md5_needed to static_branch APIEric Dumazet1-2/+2
We prefer static_branch_unlikely() over static_key_false() these days. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26tcp: get rid of __tcp_add_write_queue_tail()Eric Dumazet1-6/+1
This helper is only used from tcp_add_write_queue_tail(), and does not make the code more readable. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26tcp: get rid of tcp_check_send_head()Eric Dumazet1-6/+0
This helper is used only once, and its name is no longer relevant. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26vxlan: add extack support for create and changelinkRoopa Prabhu1-0/+31
This patch adds extack coverage in vxlan link create and changelink paths. Introduces a new helper vxlan_nl2flags to consolidate flag attribute validation. thanks to Johannes Berg for some tips to construct the generic vxlan flag extack strings. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26devlink: create a special NDO for getting the devlink instanceJakub Kicinski1-0/+9
Instead of iterating over all devlink ports add a NDO which will return the devlink instance from the driver. v2: add the netdev_to_devlink() helper (Michal) v3: check that devlink has ops (Florian) v4: hold devlink_mutex (Jiri) Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: devlink: turn devlink into a built-inJakub Kicinski1-6/+4
Being able to build devlink as a module causes growing pains. First all drivers had to add a meta dependency to make sure they are not built in when devlink is built as a module. Now we are struggling to invoke ethtool compat code reliably. Make devlink code built-in, users can still not build it at all but the dynamically loadable module option is removed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: remove unused struct inet_frag_queue.fragments fieldPeter Oskolkov1-3/+1
Now that all users of struct inet_frag_queue have been converted to use 'rb_fragments', remove the unused 'fragments' field. Build with `make allyesconfig` succeeded. ip_defrag selftest passed. Signed-off-by: Peter Oskolkov <posk@google.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: set dedicated tcf_walker flag when tp is emptyVlad Buslov1-0/+1
Using tcf_walker->stop flag to determine when tcf_walker->fn() was called at least once is unreliable. Some classifiers set 'stop' flag on error before calling walker callback, other classifiers used to call it with NULL filter pointer when empty. In order to prevent further regressions, extend tcf_walker structure with dedicated 'nonempty' flag. Set this flag in tcf_walker->fn() implementation that is used to check if classifier has filters configured. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24switchdev: Complete removal of switchdev_port_attr_get()Florian Fainelli1-4/+0
We have no more in tree users of switchdev_port_attr_get() after d0e698d57a94 ("Merge branch 'net-Get-rid-of-switchdev_port_attr_get'") so completely remove the function signature and body. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24dsa: Remove phydev parameter from disable_port callAndrew Lunn1-2/+1
No current DSA driver makes use of the phydev parameter passed to the disable_port call. Remove it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextDavid S. Miller2-1/+2
Johan Hedberg says: ==================== Here's the main bluetooth-next pull request for the 5.1 kernel. - Fixes & improvements to mediatek, hci_qca, btrtl, and btmrvl HCI drivers - Fixes to parsing invalid L2CAP config option sizes - Locking fix to bt_accept_enqueue() - Add support for new Marvel sd8977 chipset - Various other smaller fixes & cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tls: Return type of non-data records retrieved using MSG_PEEK in recvmsgVakul Garg1-0/+10
The patch enables returning 'type' in msghdr for records that are retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked from socket from getting clubbed with any other record of different type when records are subsequently dequeued from strparser. For each record, we now retain its type in sk_buff's control buffer cb[]. Inside control buffer, record's full length and offset are already stored by strparser in 'struct strp_msg'. We store record type after 'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is stored just after record dequeue. For tls1.3, the type is stored after record has been decrypted. Inside process_rx_list(), before processing a non-data record, we check that we must be able to return back the record type to the user application. If not, the decrypted records in tls context's rx_list is left there without consuming any data. Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ipv6: icmp: use percpu allocationKefeng Wang1-1/+1
Use percpu allocation for the ipv6.icmp_sk. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-5/+12
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net_sched: initialize net pointer inside tcf_exts_init()Cong Wang1-2/+3
For tcindex filter, it is too late to initialize the net pointer in tcf_exts_validate(), as tcf_exts_get_net() requires a non-NULL net pointer. We can just move its initialization into tcf_exts_init(), which just requires an additional parameter. This makes the code in tcindex_alloc_perfect_hash() prettier. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22Merge tag 'wireless-drivers-next-for-davem-2019-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-nextDavid S. Miller1-0/+6
Kalle Valo says: ==================== wireless-drivers-next patches for 5.1 Most likely the last set of patches for 5.1. WPA3 support to ath10k and qtnfmac. FTM support to iwlwifi and ath10k. And of course other new features and bugfixes. wireless-drivers was merged due to dependency in mt76. Major changes: iwlwifi * HE radiotap * FTM (Fine Timing Measurement) initiator and responder implementation * bump supported firmware API to 46 * VHT extended NSS support * new PCI IDs for 9260 and 22000 series ath10k * change QMI interface to support the new (and backwards incompatible) interface from HL3.1 and used in recent HL2.0 branch firmware releases * support WPA3 with WCN3990 * support for mac80211 airtime fairness based on transmit rate estimation, the firmware needs to support WMI_SERVICE_PEER_STATS to enable this * report transmit airtime to mac80211 with firmwares having WMI_SERVICE_REPORT_AIRTIME feature, this to have more accurate airtime fairness based on real transmit time (instead of just estimated from transmit rate) * support Fine Timing Measurement (FTM) responder role * add dynamic VLAN support with firmware having WMI_SERVICE_PER_PACKET_SW_ENCRYPT * switch to use SPDX license identifiers ath * add new country codes for US brcmfmac * support monitor frames with the hardware/ucode header qtnfmac * enable WPA3 SAE and OWE support mt76 * beacon support for USB devices (mesh+ad-hoc only) rtlwifi * convert to use SPDX license identifiers libertas_tf * get the MAC address before registering the device ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22Merge remote-tracking branch 'net-next/master' into mac80211-nextJohannes Berg26-74/+611
Merge net-next to resolve a conflict and to get the mac80211 rhashtable fixes so further patches can be applied on top. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22cfg80211: allow sending vendor events unicastJohannes Berg1-2/+47
Sometimes, we may want to transport higher bandwidth data through vendor events, and in that case sending it multicast is a bad idea. Allow vendor events to be unicast. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22mac80211: notify driver on subsequent CSA beaconsSara Sharon1-2/+7
Some drivers may want to track further the CSA beacons, for example to compensate for buggy APs that change the beacon count or quiet mode during CSA flow. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22radiotap: add 0-length PSDU "not captured" typeJohannes Berg1-1/+2
This type was defined in radiotap but we didn't add it to the header file, add it now. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22mac80211: abort CSA if beacon does not include CSA IEsSara Sharon1-0/+5
In case we receive a beacon without CSA IE while we are in the middle of channel switch - abort the operation. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22mac80211: support max channel switch time elementSara Sharon1-0/+4
2018 REVmd of the spec introduces the max channel switch time element which is optionally included in beacons/probes when there is a channel switch / extended channel switch element. The value represents the maximum delay between the time the AP transmitted the last beacon in current channel and the expected time of the first beacon in the new channel, in TU. Parse the value and pass it to the driver. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-22cfg80211: Report Association Request frame IEs in association eventsJouni Malinen1-2/+5
This extends the NL80211_CMD_ASSOCIATE event case to report NL80211_ATTR_REQ_IE similarly to what is already done with the NL80211_CMD_CONNECT events if the driver provides this information. In practice, this adds (Re)Association Request frame information element reporting to mac80211 drivers for the cases where user space SME is used. This provides more information for user space to figure out which capabilities were negotiated for the association. For example, this can be used to determine whether HT, VHT, or HE is used. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-21phonet: fix building with clangArnd Bergmann1-2/+3
clang warns about overflowing the data[] member in the struct pnpipehdr: net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds] if (hdr->data[4] == PEP_IND_READY) ^ ~ include/net/phonet/pep.h:66:3: note: array 'data' declared here u8 data[1]; Using a flexible array member at the end of the struct avoids the warning, but since we cannot have a flexible array member inside of the union, each index now has to be moved back by one, which makes it a little uglier. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: RĂ©mi Denis-Courmont <remi@remlab.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller1-3/+9
Steffen Klassert says: ==================== pull request (net): ipsec 2019-02-21 1) Don't do TX bytes accounting for the esp trailer when sending from a request socket as this will result in an out of bounds memory write. From Martin Willi. 2) Destroy xfrm_state synchronously on net exit path to avoid nested gc flush callbacks that may trigger a warning in xfrm6_tunnel_net_exit(). From Cong Wang. 3) Do an unconditionally clone in pfkey_broadcast_one() to avoid a race when freeing the skb. From Sean Tranchetti. 4) Fix inbound traffic via XFRM interfaces across network namespaces. We did the lookup for interfaces and policies in the wrong namespace. From Tobias Brunner. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net: Get rid of switchdev_port_attr_get()Florian Fainelli1-8/+0
With the bridge no longer calling switchdev_port_attr_get() to obtain the supported bridge port flags from a driver but instead trying to set the bridge port flags directly and relying on driver to reject unsupported configurations, we can effectively get rid of switchdev_port_attr_get() entirely since this was the only place where it was called. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORTFlorian Fainelli1-2/+0
Now that we have converted the bridge code and the drivers to check for bridge port(s) flags at the time we try to set them, there is no need for a get() -> set() sequence anymore and SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused. Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net: switchdev: Add PORT_PRE_BRIDGE_FLAGSFlorian Fainelli1-1/+2
In preparation for removing switchdev_port_attr_get(), introduce PORT_PRE_BRIDGE_FLAGS which will be called through switchdev_port_attr_set(), in the caller's context (possibly atomic) and which must be checked by the switchdev driver in order to return whether the operation is supported or not. This is entirely analoguous to how the BRIDGE_FLAGS_SUPPORT works, except it goes through a set() instead of get(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net: dsa: add support for bridge flagsRussell King1-0/+2
The Linux bridge implementation allows various properties of the bridge to be controlled, such as flooding unknown unicast and multicast frames. This patch adds the necessary DSA infrastructure to allow the Linux bridge support to control these properties for DSA switches. Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> [florian: Add missing dp and ds variables declaration to fix build] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net/smc: add smcd support to the pnet tableHans Wippel1-0/+1
Currently, users can only set pnetids for netdevs and ib devices in the pnet table. This patch adds support for smcd devices to the pnet table. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19net/tls: Move protocol constants from cipher context to tls contextVakul Garg1-17/+29
Each tls context maintains two cipher contexts (one each for tx and rx directions). For each tls session, the constants such as protocol version, ciphersuite, iv size, associated data size etc are same for both the directions and need to be stored only once per tls context. Hence these are moved from 'struct cipher_context' to 'struct tls_prot_info' and stored only once in 'struct tls_context'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller3-0/+29
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for you net-next tree: 1) Missing NFTA_RULE_POSITION_ID netlink attribute validation, from Phil Sutter. 2) Restrict matching on tunnel metadata to rx/tx path, from wenxu. 3) Avoid indirect calls for IPV6=y, from Florian Westphal. 4) Add two indirections to prepare merger of IPV4 and IPV6 nat modules, from Florian Westphal. 5) Broken indentation in ctnetlink, from Colin Ian King. 6) Patches to use struct_size() from netfilter and IPVS, from Gustavo A. R. Silva. 7) Display kernel splat only once in case of racing to confirm conntrack from bridge plus nfqueue setups, from Chieh-Min Wang. 8) Skip checksum validation for layer 4 protocols that don't need it, patch from Alin Nastac. 9) Sparse warning due to symbol that should be static in CLUSTERIP, from Wei Yongjun. 10) Add new toggle to disable SDP payload translation when media endpoint is reachable though the same interface as the signalling peer, from Alin Nastac. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17ethtool: add compat for flash updateJakub Kicinski1-0/+7
If driver does not support ethtool flash update operation call into devlink. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17devlink: add flash update commandJakub Kicinski1-0/+3
Add devlink flash update command. Advanced NICs have firmware stored in flash and often cryptographically secured. Updating that flash is handled by management firmware. Ethtool has a flash update command which served us well, however, it has two shortcomings: - it takes rtnl_lock unnecessarily - really flash update has nothing to do with networking, so using a networking device as a handle is suboptimal, which leads us to the second one: - it requires a functioning netdev - in case device enters an error state and can't spawn a netdev (e.g. communication with the device fails) there is no netdev to use as a handle for flashing. Devlink already has the ability to report the firmware versions, now with the ability to update the firmware/flash we will be able to recover devices in bad state. To enable updates of sub-components of the FW allow passing component name. This name should correspond to one of the versions reported in devlink info. v1: - replace target id with component name (Jiri). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-0/+3
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-02-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) numerous libbpf API improvements, from Andrii, Andrey, Yonghong. 2) test all bpf progs in alu32 mode, from Jiong. 3) skb->sk access and bpf_sk_fullsock(), bpf_tcp_sock() helpers, from Martin. 4) support for IP encap in lwt bpf progs, from Peter. 5) remove XDP_QUERY_XSK_UMEM dead code, from Jan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-1/+2
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13flow_offload: fix block statsJohn Hurley1-3/+3
With the introduction of flow_stats_update(), drivers now update the stats fields of the passed tc_cls_flower_offload struct, rather than call tcf_exts_stats_update() directly to update the stats of offloaded TC flower rules. However, if multiple qdiscs are registered to a TC shared block and a flower rule is applied, then, when getting stats for the rule, multiple callbacks may be made. Take this into consideration by modifying flow_stats_update to gather the stats from all callbacks. Currently, the values in tc_cls_flower_offload only account for the last stats callback in the list. Fixes: 3b1903ef97c0 ("flow_offload: add statistics retrieval infrastructure and use it") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: fix possible overflow in __sk_mem_raise_allocated()Eric Dumazet1-1/+1
With many active TCP sockets, fat TCP sockets could fool __sk_mem_raise_allocated() thanks to an overflow. They would increase their share of the memory, instead of decreasing it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13ipv6_stub: add ipv6_route_input stub/proxy.Peter Oskolkov1-0/+1
Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap (see the next patch in the patchset). Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encapPeter Oskolkov1-0/+2
Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers to packets (e.g. IP/GRE, GUE, IPIP). This is useful when thousands of different short-lived flows should be encapped, each with different and dynamically determined destination. Although lwtunnels can be used in some of these scenarios, the ability to dynamically generate encap headers adds more flexibility, e.g. when routing depends on the state of the host (reflected in global bpf maps). v7 changes: - added a call skb_clear_hash(); - removed calls to skb_set_transport_header(); - refuse to encap GSO-enabled packets. v8 changes: - fix build errors when LWT is not enabled. Note: the next patch in the patchset with deal with GSO-enabled packets, which are currently rejected at encapping attempt. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13netfilter: reject: skip csum verification for protocols that don't support itAlin Nastac3-0/+29
Some protocols have other means to verify the payload integrity (AH, ESP, SCTP) while others are incompatible with nf_ip(6)_checksum implementation because checksum is either optional or might be partial (UDPLITE, DCCP, GRE). Because nf_ip(6)_checksum was used to validate the packets, ip(6)tables REJECT rules were not capable to generate ICMP(v6) errors for the protocols mentioned above. This commit also fixes the incorrect pseudo-header protocol used for IPv4 packets that carry other transport protocols than TCP or UDP (pseudo-header used protocol 0 iso the proper value). Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12net: sched: add flags to Qdisc class ops structVlad Buslov1-0/+8
Extend Qdisc_class_ops with flags. Create enum to hold possible class ops flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED to indicate that class ops functions can be called without taking rtnl lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: extend proto ops to support unlocked classifiersVlad Buslov2-6/+13
Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk functions to track rtnl lock status. Extend users of these function in cls API to propagate rtnl lock status to them. This allows classifiers to obtain rtnl lock when necessary and to pass rtnl lock status to extensions and driver offload callbacks. Add flags field to tcf proto ops. Add flag value to indicate that classifier doesn't require rtnl lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: extend proto ops with 'put' callbackVlad Buslov1-0/+1
Add optional tp->ops->put() API to be implemented for filter reference counting. This new function is called by cls API to release filter reference for filters returned by tp->ops->change() or tp->ops->get() functions. Implement tfilter_put() helper to call tp->ops->put() only for classifiers that implement it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: track rtnl lock status when validating extensionsVlad Buslov1-1/+1
Actions API is already updated to not rely on rtnl lock for synchronization. However, it need to be provided with rtnl status when called from classifiers API in order to be able to correctly release the lock when loading kernel module. Extend extension validation function with 'rtnl_held' flag which is passed to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls API. No classifier is currently updated to support unlocked execution, so pass hardcoded 'true' flag parameter value. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: prevent insertion of new classifiers during chain flushVlad Buslov1-0/+1
Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of new classifier instances when chain flushing is in progress in order to prevent resource leak when tcf_proto is created by unlocked users concurrently. Return EAGAIN error from tcf_chain_tp_insert_unique() to restart tc_new_tfilter() and lookup the chain/proto again. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: refactor tp insert/delete for concurrent executionVlad Buslov1-0/+18
Implement unique insertion function to atomically attach tcf_proto to chain after verifying that no other tcf proto with specified priority exists. Implement delete function that verifies that tp is actually empty before deleting it. Use these functions to refactor cls API to account for concurrent tp and rule update instead of relying on rtnl lock. Add new 'deleting' flag to tcf proto. Use it to restart search when iterating over tp's on chain to prevent accessing potentially inval tp->next pointer. Extend tcf proto with spinlock that is intended to be used to protect its data from concurrent modification instead of relying on rtnl mutex. Use it to protect 'deleting' flag. Add lockdep macros to validate that lock is held when accessing protected fields. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: traverse classifiers in chain with tcf_get_next_proto()Vlad Buslov1-0/+2
All users of chain->filters_chain rely on rtnl lock and assume that no new classifier instances are added when traversing the list. Use tcf_get_next_proto() to traverse filters list without relying on rtnl mutex. This function iterates over classifiers by taking reference to current iterator classifier only and doesn't assume external synchronization of filters list. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: introduce reference counting for tcf_protoVlad Buslov1-0/+1
In order to remove dependency on rtnl lock and allow concurrent tcf_proto modification, extend tcf_proto with reference counter. Implement helper get/put functions for tcf proto and use them to modify cls API to always take reference to tcf_proto while using it. Only release reference to parent chain after releasing last reference to tp. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect filter_chain list with filter_chain_lock mutexVlad Buslov1-0/+17
Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain when accessing filter_chain list, instead of relying on rtnl lock. Dereference filter_chain with tcf_chain_dereference() lockdep macro to verify that all users of chain_list have the lock taken. Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute all necessary code while holding chain lock in order to prevent invalidation of chain_info structure by potential concurrent change. This also serializes calls to tcf_chain0_head_change(), which allows head change callbacks to rely on filter_chain_lock for synchronization instead of rtnl mutex. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>