aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-02-10sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_apiJiri Pirko13-24/+32
Creation is done in this file, move destruction to be at the same place. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: rename tcf_destroy to tcf_destroy_protoJiri Pirko2-6/+6
This function destroys TC filter protocol, not TC filter. So name it accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Add events for FIB replace and appendIdo Schimmel1-13/+14
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND} flags to signal routes being replaced or appended. Instead of using netlink flags for in-kernel notifications we can simply introduce two new events in the FIB notification chain. This has the added advantage of making the API cleaner, thereby making it clear that these events should be supported by listeners of the notification chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Send notification before deleting FIB aliasIdo Schimmel1-7/+7
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD notification is sent after the reference on the previous FIB info was dropped. This is problematic as potential listeners might need to access it in their notification blocks. Solve this by sending the notification prior to the deletion of the replaced FIB alias. This is consistent with ENTRY_DEL notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Send deletion notification with actual FIB alias typeIdo Schimmel1-2/+2
When a FIB alias is removed, a notification is sent using the type passed from user space - can be RTN_UNSPEC - instead of the actual type of the removed alias. This is problematic for listeners of the FIB notification chain, as several FIB aliases can exist with matching parameters, but the type. Solve this by passing the actual type of the removed FIB alias. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Only flush FIB aliases belonging to currently flushed tableIdo Schimmel1-1/+2
In case the MAIN table is flushed and its trie is shared with the LOCAL table, then we might be flushing FIB aliases belonging to the latter. This can lead to FIB_ENTRY_DEL notifications sent with the wrong table ID. The above doesn't affect current listeners, as the table ID is ignored during entry deletion, but this will change later in the patchset. When flushing a particular table, skip any aliases belonging to a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Patrick McHardy <kaber@trash.net> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Pack struct sw_flow_key.Jarno Rajahalme4-34/+39
struct sw_flow_key has two 16-bit holes. Move the most matched conntrack match fields there. In some typical cases this reduces the size of the key that needs to be hashed into half and into one cache line. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Add force commit.Jarno Rajahalme1-2/+24
Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Add original direction conntrack tuple to sw_flow_key.Jarno Rajahalme7-46/+227
Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Inherit master's labels.Jarno Rajahalme1-14/+31
We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor of inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Refactor labels initialization.Jarno Rajahalme1-42/+62
Refactoring conntrack labels initialization makes changes in later patches easier to review. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Simplify labels length logic.Jarno Rajahalme1-11/+9
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 distinct labels"), the size of conntrack labels extension has fixed to 128 bits, so we do not need to check for labels sizes shorter than 128 at run-time. This patch simplifies labels length logic accordingly, but allows the conntrack labels size to be increased in the future without breaking the build. In the event of conntrack labels increasing in size OVS would still be able to deal with the 128 first label bits. Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Unionize ovs_key_ct_label with a u32 array.Jarno Rajahalme1-7/+8
Make the array of labels in struct ovs_key_ct_label an union, adding a u32 array of the same byte size as the existing u8 array. It is faster to loop through the labels 32 bits at the time, which is also the alignment of netlink attributes. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Do not trigger events for unconfirmed connections.Jarno Rajahalme1-6/+22
Receiving change events before the 'new' event for the connection has been received can be confusing. Avoid triggering change events for setting conntrack mark or labels before the conntrack entry has been confirmed. Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.Jarno Rajahalme1-2/+22
The conntrack lookup for existing connections fails to invert the packet 5-tuple for NATted packets, and therefore fails to find the existing conntrack entry. Conntrack only stores 5-tuples for incoming packets, and there are various situations where a lookup on a packet that has already been transformed by NAT needs to be made. Looking up an existing conntrack entry upon executing packet received from the userspace is one of them. This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple for the conntrack lookup whenever the packet has already been transformed by conntrack from its input form as evidenced by one of the NAT flags being set in the conntrack state metadata. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Fix comments for skb->_nfctJarno Rajahalme1-7/+7
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that they are combined into '_nfct'. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09net: dsa: Fix duplicate object ruleFlorian Fainelli1-1/+0
While adding switch.o to the list of DSA object files, we essentially duplicated the previous obj-y line and just added switch.o, remove the duplicate. Fixes: f515f192ab4f ("net: dsa: add switch notifier") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request ParameterXin Long2-0/+106
This patch is to implement Sender-Side Procedures for the Add Outgoing and Incoming Streams Request Parameter described in rfc6525 section 5.1.5-5.1.6. It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section 6.3.4 for users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09sctp: add support for generating stream reconf add incoming/outgoing streams request chunkXin Long1-0/+46
This patch is to define Add Incoming/Outgoing Streams Request Parameter described in rfc6525 section 4.5 and 4.6. They can be in one same chunk trunk as rfc6525 section 3.1-7 describes, so make them in one function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09sctp: implement sender-side procedures for SSN/TSN Reset Request ParameterXin Long2-0/+69
This patch is to implement Sender-Side Procedures for the SSN/TSN Reset Request Parameter descibed in rfc6525 section 5.1.4. It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3 for users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09sctp: add support for generating stream reconf ssn/tsn reset request chunkXin Long1-0/+29
This patch is to define SSN/TSN Reset Request Parameter described in rfc6525 section 4.3. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09sctp: streams should be recovered when it fails to send request.Xin Long1-2/+17
Now when sending stream reset request, it closes the streams to block further xmit of data until this request is completed, then calls sctp_send_reconf to send the chunk. But if sctp_send_reconf returns err, and it doesn't recover the streams' states back, which means the request chunk would not be queued and sent, so the asoc will get stuck, streams are closed and no packet is even queued. This patch is to fix it by recovering the streams' states when it fails to send the request, it is also to fix a return value. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08ipv4: fib: Notify about nexthop status changesIdo Schimmel1-0/+33
When a multipath route is hit the kernel doesn't consider nexthops that are DEAD or LINKDOWN when IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN is set. Devices that offload multipath routes need to be made aware of nexthop status changes. Otherwise, the device will keep forwarding packets to non-functional nexthops. Add the FIB_EVENT_NH_{ADD,DEL} events to the fib notification chain, which notify capable devices when they should add or delete a nexthop from their tables. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08bridge: vlan tunnel id info range fill size calc cleanupsRoopa Prabhu1-18/+16
This fixes a bug and cleans up tunnelid range size calculation code by using consistent variable names and checks in size calculation and fill functions. tested for a few cases of vlan-vni range mappings: (output from patched iproute2): $bridge vlan showtunnel port vid tunid vxlan0 100-105 1000-1005 200 2000 210 2100 211-213 2100-2102 214 2104 216-217 2108-2109 219 2119 Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08gro_cells: move to net/core/gro_cells.cEric Dumazet6-0/+100
We have many gro cells users, so lets move the code to avoid duplication. This creates a CONFIG_GRO_CELLS option. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller18-97/+103
The conflict was an interaction between a bug fix in the netvsc driver in 'net' and an optimization of the RX path in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds17-96/+102
Pull networking fixes from David Miller: 1) Load correct firmware in rtl8192ce wireless driver, from Jurij Smakov. 2) Fix leak of tx_ring and tx_cq due to overwriting in mlx4 driver, from Martin KaFai Lau. 3) Need to reference count PHY driver module when it is attached, from Mao Wenan. 4) Don't do zero length vzalloc() in ethtool register dump, from Stanislaw Gruszka. 5) Defer net_disable_timestamp() to a workqueue to get out of locking issues, from Eric Dumazet. 6) We cannot drop the SKB dst when IP options refer to them, fix also from Eric Dumazet. 7) Incorrect packet header offset calculations in ip6_gre, again from Eric Dumazet. 8) Missing tcp_v6_restore_cb() causes use-after-free, from Eric too. 9) tcp_splice_read() can get into an infinite loop with URG, and hey it's from Eric once more. 10) vnet_hdr_sz can change asynchronously, so read it once during decision making in macvtap and tun, from Willem de Bruijn. 11) Can't use kernel stack for DMA transfers in USB networking drivers, from Ben Hutchings. 12) Handle csum errors properly in UDP by calling the proper destructor, from Eric Dumazet. 13) For non-deterministic softirq run when scheduling NAPI from a workqueue in mlx4, from Benjamin Poirier. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits) sctp: check af before verify address in sctp_addr_id2transport sctp: avoid BUG_ON on sctp_wait_for_sndbuf mlx4: Invoke softirqs after napi_reschedule udp: properly cope with csum errors catc: Use heap buffer for memory size test catc: Combine failure cleanup code in catc_probe() rtl8150: Use heap buffers for all register access pegasus: Use heap buffers for all register access macvtap: read vnet_hdr_size once tun: read vnet_hdr_sz once tcp: avoid infinite loop in tcp_splice_read() hns: avoid stack overflow with CONFIG_KASAN ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches ipv6: tcp: add a missing tcp_v6_restore_cb() nl80211: Fix mesh HT operation check mac80211: Fix adding of mesh vendor IEs mac80211: Allocate a sync skcipher explicitly for FILS AEAD mac80211: Fix FILS AEAD protection in Association Request frame ip6_gre: fix ip6gre_err() invalid reads netlabel: out of bound access in cipso_v4_validate() ...
2017-02-07bridge: avoid unnecessary read of jiffiesstephen hemminger2-4/+8
Jiffies is volatile so read it once. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07bridge: remove unnecessary check for vtbegin in br_fill_vlan_tinfo_rangeRoopa Prabhu1-1/+1
vtbegin should not be NULL in this function, Its already checked by the caller. this should silence the below smatch complaint: net/bridge/br_netlink_tunnel.c:144 br_fill_vlan_tinfo_range() error: we previously assumed 'vtbegin' could be null (see line 130) net/bridge/br_netlink_tunnel.c 129 130 if (vtbegin && vtend && (vtend->vid - vtbegin->vid) > 0) { ^^^^^^^ Check for NULL. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sctp: check af before verify address in sctp_addr_id2transportXin Long1-1/+1
Commit 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") invoked sctp_verify_addr to verify the addr. But it didn't check af variable beforehand, once users pass an address with family = 0 through sockopt, sctp_get_af_specific will return NULL and NULL pointer dereference will be caused by af->sockaddr_len. This patch is to fix it by returning NULL if af variable is NULL. Fixes: 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07bridge: tunnel: fix attribute checks in br_parse_vlan_tunnel_infoNikolay Aleksandrov1-4/+4
These checks should go after the attributes have been parsed otherwise we're using tb uninitialized. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: bridge: remove redundant check to see if err is setColin Ian King1-3/+0
The error check on err is redundant as it is being checked previously each time it has been updated. Remove this redundant check. Detected with CoverityScan, CID#140030("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: dsa: Do not clobber PHY link outside of state machineFlorian Fainelli1-7/+3
Calling phy_read_status() means that we may call into genphy_read_status() which in turn will use genphy_update_link() which can make changes to phydev->link outside of the state machine's state transitions. This is an invalid behavior that is now caught as of 811a919135b9 ("phy state machine: failsafe leave invalid RUNNING state") Reported-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: pending_confirm is not used anymoreJulian Anastasov1-1/+0
When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. As last step, we can remove the pending_confirm flag. Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TPJulian Anastasov9-19/+44
When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. The datagram protocols can use MSG_CONFIRM to confirm the neighbour. When used with MSG_PROBE we do not reach the code where neighbour is confirmed, so we have to do the same slow lookup by using the dst_confirm_neigh() helper. When MSG_PROBE is not used, ip_append_data/ip6_append_data will set the skb flag dst_pending_confirm. Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: add confirm_neigh method to dst_opsJulian Anastasov3-0/+54
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6 to lookup and confirm the neighbour. Its usage via the new helper dst_confirm_neigh() should be restricted to MSG_PROBE users for performance reasons. For XFRM prefer the last tunnel address, if present. With help from Steffen Klassert. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07tcp: replace dst_confirm with sk_dst_confirmJulian Anastasov3-14/+7
When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. Use the new sk_dst_confirm() helper to propagate the indication from received packets to sock_confirm_neigh(). Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Tested-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sctp: add dst_pending_confirm flagJulian Anastasov7-12/+31
Add new transport flag to allow sockets to confirm neighbour. When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. The flag is propagated from transport to every packet. It is reset when cached dst is reset. Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: add dst_pending_confirm flag to skbuffJulian Anastasov2-1/+5
Add new skbuff flag to allow protocols to confirm neighbour. When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. Add sock_confirm_neigh() helper to confirm the neighbour and use it for IPv4, IPv6 and VRF before dst_neigh_output. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sock: add sk_dst_pending_confirm flagJulian Anastasov1-0/+2
Add new sock flag to allow sockets to confirm neighbour. When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. As not all call paths lock the socket use full word for the flag. Add sk_dst_confirm as replacement for dst_confirm when called for received packets. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sctp: avoid BUG_ON on sctp_wait_for_sndbufMarcelo Ricardo Leitner1-1/+2
Alexander Popov reported that an application may trigger a BUG_ON in sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is waiting on it to queue more data and meanwhile another thread peels off the association being used by the first thread. This patch replaces the BUG_ON call with a proper error handling. It will return -EPIPE to the original sendmsg call, similarly to what would have been done if the association wasn't found in the first place. Acked-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07ipv6: sr: fix non static symbol warningsWei Yongjun1-4/+4
Fixes the following sparse warnings: net/ipv6/seg6_iptunnel.c:58:5: warning: symbol 'nla_put_srh' was not declared. Should it be static? net/ipv6/seg6_iptunnel.c:238:5: warning: symbol 'seg6_input' was not declared. Should it be static? net/ipv6/seg6_iptunnel.c:254:5: warning: symbol 'seg6_output' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net/sched: act_mirred: remove duplicated include from act_mirred.cWei Yongjun1-2/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07udp: properly cope with csum errorsEric Dumazet3-4/+8
Dmitry reported that UDP sockets being destroyed would trigger the WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct() It turns out we do not properly destroy skb(s) that have wrong UDP checksum. Thanks again to syzkaller team. Fixes : 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: dsa: Add support for platform dataFlorian Fainelli1-18/+84
Allow drivers to use the new DSA API with platform data. Most of the code in net/dsa/dsa2.c does not rely so much on device_nodes and can get the same information from platform_data instead. We purposely do not support distributed configurations with platform data, so drivers should be providing a pointer to a 'struct dsa_chip_data' structure if they wish to communicate per-port layout. Multiple CPUs port could potentially be supported and dsa_chip_data is extended to receive up to one reference to an upstream network device per port described by a dsa_chip_data structure. dsa_dev_to_net_device() increments the network device's reference count, so we intentionally call dev_put() to be consistent with the DT-enabled path, until we have a generic notifier based solution. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: dsa: Rename and export dev_to_net_device()Florian Fainelli1-2/+3
In preparation for using this function in net/dsa/dsa2.c, rename the function to make its scope DSA specific, and export it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: fdb: write to used and updated at most once per jiffyNikolay Aleksandrov2-2/+4
Writing once per jiffy is enough to limit the bridge's false sharing. After this change the bridge doesn't show up in the local load HitM stats. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: move write-heavy fdb members in their own cache lineNikolay Aleksandrov1-4/+6
Fdb's used and updated fields are written to on every packet forward and packet receive respectively. Thus if we are receiving packets from a particular fdb, they'll cause false-sharing with everyone who has looked it up (even if it didn't match, since mac/vid share cache line!). The "used" field is even worse since it is updated on every packet forward to that fdb, thus the standard config where X ports use a single gateway results in 100% fdb false-sharing. Note that this patch does not prevent the last scenario, but it makes it better for other bridge participants which are not using that fdb (and are only doing lookups over it). The point is with this move we make sure that only communicating parties get the false-sharing, in a later patch we'll show how to avoid that too. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: move to workqueue gcNikolay Aleksandrov10-23/+29
Move the fdb garbage collector to a workqueue which fires at least 10 milliseconds apart and cleans chain by chain allowing for other tasks to run in the meantime. When having thousands of fdbs the system is much more responsive. Most importantly remove the need to check if the matched entry has expired in __br_fdb_get that causes false-sharing and is completely unnecessary if we cleanup entries, at worst we'll get 10ms of traffic for that entry before it gets deleted. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: modify bridge and port to have often accessed fields in one cache lineNikolay Aleksandrov1-23/+20
Move around net_bridge so the vlan fields are in the beginning since they're checked on every packet even if vlan filtering is disabled. For the port move flags & vlan group to the beginning, so they're in the same cache line with the port's state (both flags and state are checked on each packet). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>