aboutsummaryrefslogtreecommitdiffstats
path: root/net/bridge (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+20
Pablo Neira Ayuso says: ==================== Netfilter/IPVS/OVS fixes for net The following patchset contains a rather large batch of Netfilter, IPVS and OVS fixes for your net tree. This includes fixes for ctnetlink, the userspace conntrack helper infrastructure, conntrack OVS support, ebtables DNAT target, several leaks in error path among other. More specifically, they are: 1) Fix reference count leak in the CT target error path, from Gao Feng. 2) Remove conntrack entry clashing with a matching expectation, patch from Jarno Rajahalme. 3) Fix bogus EEXIST when registering two different userspace helpers, from Liping Zhang. 4) Don't leak dummy elements in the new bitmap set type in nf_tables, from Liping Zhang. 5) Get rid of module autoload from conntrack update path in ctnetlink, we don't need autoload at this late stage and it is happening with rcu read lock held which is not good. From Liping Zhang. 6) Fix deadlock due to double-acquire of the expect_lock from conntrack update path, this fixes a bug that was introduced when the central spinlock got removed. Again from Liping Zhang. 7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock protection that was selected when the central spinlock was removed was not really protecting anything at all. 8) Protect sequence adjustment under ct->lock. 9) Missing socket match with IPv6, from Peter Tirsek. 10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from Linus Luessing. 11) Don't give up on evaluating the expression on new entries added via dynset expression in nf_tables, from Liping Zhang. 12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals with non-linear skbuffs. 13) Don't allow IPv6 service in IPVS if no IPv6 support is available, from Paolo Abeni. 14) Missing mutex release in error path of xt_find_table_lock(), from Dan Carpenter. 15) Update maintainers files, Netfilter section. Add Florian to the file, refer to nftables.org and change project status from Supported to Maintained. 16) Bail out on mismatching extensions in element updates in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller5-51/+48
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree. A large bunch of code cleanups, simplify the conntrack extension codebase, get rid of the fake conntrack object, speed up netns by selective synchronize_net() calls. More specifically, they are: 1) Check for ct->status bit instead of using nfct_nat() from IPVS and Netfilter codebase, patch from Florian Westphal. 2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao. 3) Simplify FTP IPVS helper module registration path, from Arushi Singhal. 4) Introduce nft_is_base_chain() helper function. 5) Enforce expectation limit from userspace conntrack helper, from Gao Feng. 6) Add nf_ct_remove_expect() helper function, from Gao Feng. 7) NAT mangle helper function return boolean, from Gao Feng. 8) ctnetlink_alloc_expect() should only work for conntrack with helpers, from Gao Feng. 9) Add nfnl_msg_type() helper function to nfnetlink to build the netlink message type. 10) Get rid of unnecessary cast on void, from simran singhal. 11) Use seq_puts()/seq_putc() instead of seq_printf() where possible, also from simran singhal. 12) Use list_prev_entry() from nf_tables, from simran signhal. 13) Remove unnecessary & on pointer function in the Netfilter and IPVS code. 14) Remove obsolete comment on set of rules per CPU in ip6_tables, no longer true. From Arushi Singhal. 15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng. 16) Remove unnecessary nested rcu_read_lock() in __nf_nat_decode_session(). Code running from hooks are already guaranteed to run under RCU read side. 17) Remove deadcode in nf_tables_getobj(), from Aaron Conole. 18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(), also from Aaron. 19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole. 20) Don't propagate NF_DROP error to userspace via ctnetlink in __nf_nat_alloc_null_binding() function, from Gao Feng. 21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks, from Gao Feng. 22) Kill the fake untracked conntrack objects, use ctinfo instead to annotate a conntrack object is untracked, from Florian Westphal. 23) Remove nf_ct_is_untracked(), now obsolete since we have no conntrack template anymore, from Florian. 24) Add event mask support to nft_ct, also from Florian. 25) Move nf_conn_help structure to include/net/netfilter/nf_conntrack_helper.h. 26) Add a fixed 32 bytes scratchpad area for conntrack helpers. Thus, we don't deal with variable conntrack extensions anymore. Make sure userspace conntrack helper doesn't go over that size. Remove variable size ct extension infrastructure now this code got no more clients. From Florian Westphal. 27) Restore offset and length of nf_ct_ext structure to 8 bytes now that wraparound is not possible any longer, also from Florian. 28) Allow to get rid of unassured flows under stress in conntrack, this applies to DCCP, SCTP and TCP protocols, from Florian. 29) Shrink size of nf_conntrack_ecache structure, from Florian. 30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker, from Gao Feng. 31) Register SYNPROXY hooks on demand, from Florian Westphal. 32) Use pernet hook whenever possible, instead of global hook registration, from Florian Westphal. 33) Pass hook structure to ebt_register_table() to consolidate some infrastructure code, from Florian Westphal. 34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the SYNPROXY code, to make sure device stats are not fooled, patch from Gao Feng. 35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we don't need anymore if we just select a fixed size instead of expensive runtime time calculation of this. From Florian. 36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(), from Florian. 37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from Florian. 38) Attach NAT extension on-demand from masquerade and pptp helper path, from Florian. 39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole. 40) Speed up netns by selective calls of synchronize_net(), from Florian Westphal. 41) Silence stack size warning gcc in 32-bit arch in snmp helper, from Florian. 42) Inconditionally call nf_ct_ext_destroy(), even if we have no extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: bridge: Fix improper taking over HW learned FDBArkadi Sharshevsky1-5/+3
Commit 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries") added the ability to "take over an entry which was previously learned via HW when it shows up from a SW port". However, if an entry was learned via HW and then a control packet (e.g., ARP request) was trapped to the CPU, the bridge driver will update the entry and remove the externally learned flag, although the entry is still present in HW. Instead, only clear the externally learned flag in case of roaming. Fixes: 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Arkadi Sharashevsky <arkadis@mellanox.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27bridge: add per-port broadcast flood flagMike Manning4-8/+23
Support for l2 multicast flood control was added in commit b6cb5ac8331b ("net: bridge: add per-port multicast flood flag"). It allows broadcast as it was introduced specifically for unknown multicast flood control. But as broadcast is a special case of multicast, this may also need to be disabled. For this purpose, introduce a flag to disable the flooding of received l2 broadcasts. This approach is backwards compatible and provides flexibility in filtering for the desired packet types. Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Mike Manning <mmanning@brocade.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26ebtables: remove nf_hook_register usageFlorian Westphal4-49/+46
Similar to ip_register_table, pass nf_hook_ops to ebt_register_table(). This allows to handle hook registration also via pernet_ops and allows us to avoid use of legacy register_hook api. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-25bridge: move bridge multicast cleanup to ndo_uninitXin Long2-1/+1
During removing a bridge device, if the bridge is still up, a new mdb entry still can be added in br_multicast_add_group() after all mdb entries are removed in br_multicast_dev_del(). Like the path: mld_ifc_timer_expire -> mld_sendpack -> ... br_multicast_rcv -> br_multicast_add_group The new mp's timer will be set up. If the timer expires after the bridge is freed, it may cause use-after-free panic in br_multicast_group_expired. BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge] Call Trace: <IRQ> [<ffffffff81094536>] call_timer_fn+0x36/0x110 [<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge] [<ffffffff81096967>] run_timer_softirq+0x237/0x340 [<ffffffff8108dcbf>] __do_softirq+0xef/0x280 [<ffffffff8169889c>] call_softirq+0x1c/0x30 [<ffffffff8102c275>] do_softirq+0x65/0xa0 [<ffffffff8108e055>] irq_exit+0x115/0x120 [<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80 Nikolay also found it would cause a memory leak - the mdb hash is reallocated and not freed due to the mdb rehash. unreferenced object 0xffff8800540ba800 (size 2048): backtrace: [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0 [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0 [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge] [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge] [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge] [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge] [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge] [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0 [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10 [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20 [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6] [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6] [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6] [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6] [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6] [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6] This could happen when ip link remove a bridge or destroy a netns with a bridge device inside. With Nikolay's suggestion, this patch is to clean up bridge multicast in ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so that netif_running check in br_multicast_add_group can avoid this issue. v1->v2: - fix this issue by moving br_multicast_dev_del to ndo_uninit, instead of calling dev_close in br_dev_delete. (NOTE: Depends upon b6fe0440c637 ("bridge: implement missing ndo_uninit()")) Fixes: e10177abf842 ("bridge: multicast: fix handling of temp and perm entries") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25bridge: ebtables: fix reception of frames DNAT-ed to bridge device/portLinus Lüssing1-0/+20
When trying to redirect bridged frames to the bridge device itself or a bridge port (brouting) via the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge device or port just fine. However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target left the skb->pkt_type as PACKET_OTHERHOST. Fixing this by resetting skb->pkt_type to an appropriate type after dnat'ing. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-17net: rtnetlink: plumb extended ack to doit functionDavid Ahern1-2/+4
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: bridge: notify on hw fdb takeoverNikolay Aleksandrov1-1/+3
Recently we added support for SW fdbs to take over HW ones, but that results in changing a user-visible fdb flag thus we need to send a notification, also it's consistent with how HW takes over SW entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-14/+26
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13netlink: pass extended ACK struct to parsing functionsJohannes Berg3-5/+6
Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11bridge: netlink: register netdevice before executing changelinkIdo Schimmel1-2/+5
Peter reported a kernel oops when executing the following command: $ ip link add name test type bridge vlan_default_pvid 1 [13634.939408] BUG: unable to handle kernel NULL pointer dereference at 0000000000000190 [13634.939436] IP: __vlan_add+0x73/0x5f0 [...] [13634.939783] Call Trace: [13634.939791] ? pcpu_next_unpop+0x3b/0x50 [13634.939801] ? pcpu_alloc+0x3d2/0x680 [13634.939810] ? br_vlan_add+0x135/0x1b0 [13634.939820] ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0 [13634.939834] ? br_changelink+0x120/0x4e0 [13634.939844] ? br_dev_newlink+0x50/0x70 [13634.939854] ? rtnl_newlink+0x5f5/0x8a0 [13634.939864] ? rtnl_newlink+0x176/0x8a0 [13634.939874] ? mem_cgroup_commit_charge+0x7c/0x4e0 [13634.939886] ? rtnetlink_rcv_msg+0xe1/0x220 [13634.939896] ? lookup_fast+0x52/0x370 [13634.939905] ? rtnl_newlink+0x8a0/0x8a0 [13634.939915] ? netlink_rcv_skb+0xa1/0xc0 [13634.939925] ? rtnetlink_rcv+0x24/0x30 [13634.939934] ? netlink_unicast+0x177/0x220 [13634.939944] ? netlink_sendmsg+0x2fe/0x3b0 [13634.939954] ? _copy_from_user+0x39/0x40 [13634.939964] ? sock_sendmsg+0x30/0x40 [13634.940159] ? ___sys_sendmsg+0x29d/0x2b0 [13634.940326] ? __alloc_pages_nodemask+0xdf/0x230 [13634.940478] ? mem_cgroup_commit_charge+0x7c/0x4e0 [13634.940592] ? mem_cgroup_try_charge+0x76/0x1a0 [13634.940701] ? __handle_mm_fault+0xdb9/0x10b0 [13634.940809] ? __sys_sendmsg+0x51/0x90 [13634.940917] ? entry_SYSCALL_64_fastpath+0x1e/0xad The problem is that the bridge's VLAN group is created after setting the default PVID, when registering the netdevice and executing its ndo_init(). Fix this by changing the order of both operations, so that br_changelink() is only processed after the netdevice is registered, when the VLAN group is already initialized. Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Peter V. Saveliev <peter@svinota.eu> Tested-by: Peter V. Saveliev <peter@svinota.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11bridge: implement missing ndo_uninit()Ido Schimmel4-12/+21
While the bridge driver implements an ndo_init(), it was missing a symmetric ndo_uninit(), causing the different de-initialization operations to be scattered around its dellink() and destructor(). Implement a symmetric ndo_uninit() and remove the overlapping operations from its dellink() and destructor(). This is a prerequisite for the next patch, as it allows us to have a proper cleanup upon changelink() failure during the bridge's newlink(). Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07netfilter: Remove exceptional & on function nameArushi Singhal1-1/+1
Remove & from function pointers to conform to the style found elsewhere in the file. Done using the following semantic patch // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07netfilter: Remove unnecessary cast on void pointersimran singhal1-1/+1
The following Coccinelle script was used to detect this: @r@ expression x; void* e; type T; identifier f; @@ ( *((T *)e) | ((T *)x)[...] | ((T*)x)->f | - (T*) e ) Unnecessary parantheses are also remove. Signed-off-by: simran singhal <singhalsimran0@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-28net: break include loop netdevice.h, dsa.h, devlink.hAndrew Lunn1-0/+1
There is an include loop between netdevice.h, dsa.h, devlink.h because of NETDEV_ALIGN, making it impossible to use devlink structures in dsa.h. Break this loop by taking dsa.h out of netdevice.h, add a forward declaration of dsa_switch_tree and netdev_set_default_ethtool_ops() function, which is what netdevice.h requires. No longer having dsa.h in netdevice.h means the includes in dsa.h no longer get included. This breaks a few other files which depend on these includes. Add these directly in the affected file. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: bridge: allow to add externally learned entries from user-spaceNikolay Aleksandrov1-0/+2
The NTF_EXT_LEARNED flag was added for switchdev and externally learned entries, but it can also be used for entries learned via a software in user-space which requires dynamic entries that do not expire. One such case that we have is with quagga and evpn which need dynamic entries but also require to age them themselves. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: bridge: allow SW learn to take over HW fdb entriesNikolay Aleksandrov1-0/+3
Allow to take over an entry which was previously learned via HW when it shows up from a SW port. This is analogous to how HW takes over SW learned entries already. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-15/+8
Conflicts: drivers/net/ethernet/broadcom/genet/bcmmii.c drivers/net/hyperv/netvsc.c kernel/bpf/hashtab.c Almost entirely overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller3-25/+18
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. A couple of new features for nf_tables, and unsorted cleanups and incremental updates for the Netfilter tree. More specifically, they are: 1) Allow to check for TCP option presence via nft_exthdr, patch from Phil Sutter. 2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana. 3) Use pr_cont() in ebt_log, from Joe Perches. 4) Remove some dead code in arp_tables reported via static analysis tool, from Colin Ian King. 5) Consolidate nf_tables expression validation, from Liping Zhang. 6) Consolidate set lookup via nft_set_lookup(). 7) Remove unnecessary rcu read lock side in bridge netfilter, from Florian Westphal. 8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo. 9) Pass nft_ctx struct to object initialization indirections, from Florian Westphal. 10) Add code to integrate conntrack helper into nf_tables, also from Florian. 11) Allow to check if interface index or name exists via NFTA_FIB_F_PRESENT, from Phil Sutter. 12) Simplify resolve_normal_ct(), from Florian. 13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang. 14) Use rwlock in nft_set_rbtree set, also from Liping Zhang. 15) One patch to remove a useless printk at netns init path in ipvs, and several patches to document IPVS knobs. 16) Use refcount_t for reference counter in the Netfilter/IPVS code, from Elena Reshetova. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bridge: resolve a false alarm of lockdepWANG Cong2-10/+1
Andrei reported a false alarm of lockdep at net/bridge/br_fdb.c:109, this is because in Andrei's case, a spin_bug() was already triggered before this, therefore the debug_locks is turned off, lockdep_is_held() is no longer accurate after that. We should use lockdep_assert_held_once() instead of lockdep_is_held() to respect debug_locks. Fixes: 410b3d48f5111 ("bridge: fdb: add proper lock checks in searching functions") Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-5/+7
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, a rather large batch of fixes targeted to nf_tables, conntrack and bridge netfilter. More specifically, they are: 1) Don't track fragmented packets if the socket option IP_NODEFRAG is set. From Florian Westphal. 2) SCTP protocol tracker assumes that ICMP error messages contain the checksum field, what results in packet drops. From Ying Xue. 3) Fix inconsistent handling of AH traffic from nf_tables. 4) Fix new bitmap set representation with big endian. Fix mismatches in nf_tables due to incorrect big endian handling too. Both patches from Liping Zhang. 5) Bridge netfilter doesn't honor maximum fragment size field, cap to largest fragment seen. From Florian Westphal. 6) Fake conntrack entry needs to be aligned to 8 bytes since the 3 LSB bits are now used to store the ctinfo. From Steven Rostedt. 7) Fix element comments with the bitmap set type. Revert the flush field in the nft_set_iter structure, not required anymore after fixing up element comments. 8) Missing error on invalid conntrack direction from nft_ct, also from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13bridge: drop netfilter fake rtable unconditionallyFlorian Westphal2-21/+1
Andreas reports kernel oops during rmmod of the br_netfilter module. Hannes debugged the oops down to a NULL rt6info->rt6i_indev. Problem is that br_netfilter has the nasty concept of adding a fake rtable to skb->dst; this happens in a br_netfilter prerouting hook. A second hook (in bridge LOCAL_IN) is supposed to remove these again before the skb is handed up the stack. However, on module unload hooks get unregistered which means an skb could traverse the prerouting hook that attaches the fake_rtable, while the 'fake rtable remove' hook gets removed from the hooklist immediately after. Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core") Reported-by: Andreas Karis <akaris@redhat.com> Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13netfilter: bridge: honor frag_max_size when refragmentingFlorian Westphal1-5/+7
consider a bridge with mtu 9000, but end host sending smaller packets to another host with mtu < 9000. In this case, after reassembly, bridge+defrag would refragment, and then attempt to send the reassembled packet as long as it was below 9k. Instead we have to cap by the largest fragment size seen. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-08netfilter: bridge: remove unneeded rcu_read_lockFlorian Westphal1-3/+0
as comment says, the function is always called with rcu read lock held. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06netfilter: nf_tables: validate the expr explicitly after init successfullyLiping Zhang1-5/+1
When we want to validate the expr's dependency or hooks, we must do two things to accomplish it. First, write a X_validate callback function and point ->validate to it. Second, call X_validate in init routine. This is very common, such as fib, nat, reject expr and so on ... It is a little ugly, since we will call X_validate in the expr's init routine, it's better to do it in nf_tables_newexpr. So we can avoid to do this again and again. After doing this, the second step listed above is not useful anymore, remove them now. Patch was tested by nftables/tests/py/nft-test.py and nftables/tests/shell/run-tests.sh. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06netfilter: Use pr_cont where appropriateJoe Perches1-17/+17
Logging output was changed when simple printks without KERN_CONT are now emitted on a new line and KERN_CONT is required to continue lines so use pr_cont. Miscellanea: o realign arguments o use print_hex_dump instead of a local variant Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-2/+3
Pull networking fixes from David Miller: 1) Fix double-free in batman-adv, from Sven Eckelmann. 2) Fix packet stats for fast-RX path, from Joannes Berg. 3) Netfilter's ip_route_me_harder() doesn't handle request sockets properly, fix from Florian Westphal. 4) Fix sendmsg deadlock in rxrpc, from David Howells. 5) Add missing RCU locking to transport hashtable scan, from Xin Long. 6) Fix potential packet loss in mlxsw driver, from Ido Schimmel. 7) Fix race in NAPI handling between poll handlers and busy polling, from Eric Dumazet. 8) TX path in vxlan and geneve need proper RCU locking, from Jakub Kicinski. 9) SYN processing in DCCP and TCP need to disable BH, from Eric Dumazet. 10) Properly handle net_enable_timestamp() being invoked from IRQ context, also from Eric Dumazet. 11) Fix crash on device-tree systems in xgene driver, from Alban Bedel. 12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de Melo. 13) Fix use-after-free in netvsc driver, from Dexuan Cui. 14) Fix max MTU setting in bonding driver, from WANG Cong. 15) xen-netback hash table can be allocated from softirq context, so use GFP_ATOMIC. From Anoob Soman. 16) Fix MAC address change bug in bgmac driver, from Hari Vyas. 17) strparser needs to destroy strp_wq on module exit, from WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) strparser: destroy workqueue on module exit sfc: fix IPID endianness in TSOv2 sfc: avoid max() in array size rds: remove unnecessary returned value check rxrpc: Fix potential NULL-pointer exception nfp: correct DMA direction in XDP DMA sync nfp: don't tell FW about the reserved buffer space net: ethernet: bgmac: mac address change bug net: ethernet: bgmac: init sequence bug xen-netback: don't vfree() queues under spinlock xen-netback: keep a local pointer for vif in backend_disconnect() netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups netfilter: nf_conntrack_sip: fix wrong memory initialisation can: flexcan: fix typo in comment can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer can: gs_usb: fix coding style can: gs_usb: Don't use stack memory for USB transfers ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines ixgbe: update the rss key on h/w, when ethtool ask for it ...
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>Ingo Molnar2-0/+2
Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01net: bridge: allow IPv6 when multicast flood is disabledMike Manning1-1/+2
Even with multicast flooding turned off, IPv6 ND should still work so that IPv6 connectivity is provided. Allow this by continuing to flood multicast traffic originated by us. Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag") Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01bridge: Fix error path in nbp_vlan_initYotam Gigi1-1/+1
Fix error path order in nbp_vlan_init, so if switchdev_port_attr_set call failes, the vlan_hash wouldn't be destroyed before inited. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-27lib/vsprintf.c: remove %Z supportAlexey Dobriyan1-1/+1
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-17bridge: don't indicate expiry on NTF_EXT_LEARNED fdb entriesRoopa Prabhu1-1/+1
added_by_external_learn fdb entries are added and expired by external entities like switchdev driver or external controllers. ageing is already disabled for such entries. Hence, don't indicate expiry for such fdb entries. CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17bridge: vlan_tunnel: explicitly reset metadata attrs to NULL on failureRoopa Prabhu1-0/+2
Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bridge: fdb: converge fdb_delete_by functions into oneNikolay Aleksandrov1-47/+15
We can simplify the logic of entries pointing to the bridge by converging the fdb_delete_by functions, this would allow us to use the same function for both cases since the fdb's dst is set to NULL if it is pointing to the bridge thus we can always check for a port match. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bridge: fdb: add proper lock checks in searching functionsNikolay Aleksandrov2-0/+13
In order to avoid new errors add checks to br_fdb_find and fdb_find_rcu functions. The first requires hash_lock, the second obviously RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bridge: fdb: converge fdb searching functions into oneNikolay Aleksandrov4-70/+54
Before this patch we had 3 different fdb searching functions which was confusing. This patch reduces all of them to one - fdb_find_rcu(), and two flavors: br_fdb_find() which requires hash_lock and br_fdb_find_rcu which requires RCU. This makes it clear what needs to be used, we also remove two abusers of __br_fdb_get which called it under hash_lock and replace them with br_fdb_find(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10switchdev: bridge: Offload mc router portsNogah Frankel1-0/+15
Offload the mc router ports list, whenever it is being changed. It is done because in some cases mc packets needs to be flooded to all the ports in this list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10bridge: mcast: Merge the mc router ports deletions to one functionNogah Frankel1-15/+9
There are three places where a port gets deleted from the mc router port list. This patch join the actual deletion to one function. It will be helpful for later patch that will offload changes in the mc router ports list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10switchdev: bridge: Offload multicast disabledNogah Frankel1-0/+16
Offload multicast disabled flag, for more accurate mc flood behavior: When it is on, the mdb should be ignored. When it is off, unregistered mc packets should be flooded to mc router ports. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08bridge: vlan tunnel id info range fill size calc cleanupsRoopa Prabhu1-18/+16
This fixes a bug and cleans up tunnelid range size calculation code by using consistent variable names and checks in size calculation and fill functions. tested for a few cases of vlan-vni range mappings: (output from patched iproute2): $bridge vlan showtunnel port vid tunid vxlan0 100-105 1000-1005 200 2000 210 2100 211-213 2100-2102 214 2104 216-217 2108-2109 219 2119 Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07bridge: avoid unnecessary read of jiffiesstephen hemminger2-4/+8
Jiffies is volatile so read it once. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07bridge: remove unnecessary check for vtbegin in br_fill_vlan_tinfo_rangeRoopa Prabhu1-1/+1
vtbegin should not be NULL in this function, Its already checked by the caller. this should silence the below smatch complaint: net/bridge/br_netlink_tunnel.c:144 br_fill_vlan_tinfo_range() error: we previously assumed 'vtbegin' could be null (see line 130) net/bridge/br_netlink_tunnel.c 129 130 if (vtbegin && vtend && (vtend->vid - vtbegin->vid) > 0) { ^^^^^^^ Check for NULL. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07bridge: tunnel: fix attribute checks in br_parse_vlan_tunnel_infoNikolay Aleksandrov1-4/+4
These checks should go after the attributes have been parsed otherwise we're using tb uninitialized. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07net: bridge: remove redundant check to see if err is setColin Ian King1-3/+0
The error check on err is redundant as it is being checked previously each time it has been updated. Remove this redundant check. Detected with CoverityScan, CID#140030("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: fdb: write to used and updated at most once per jiffyNikolay Aleksandrov2-2/+4
Writing once per jiffy is enough to limit the bridge's false sharing. After this change the bridge doesn't show up in the local load HitM stats. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: move write-heavy fdb members in their own cache lineNikolay Aleksandrov1-4/+6
Fdb's used and updated fields are written to on every packet forward and packet receive respectively. Thus if we are receiving packets from a particular fdb, they'll cause false-sharing with everyone who has looked it up (even if it didn't match, since mac/vid share cache line!). The "used" field is even worse since it is updated on every packet forward to that fdb, thus the standard config where X ports use a single gateway results in 100% fdb false-sharing. Note that this patch does not prevent the last scenario, but it makes it better for other bridge participants which are not using that fdb (and are only doing lookups over it). The point is with this move we make sure that only communicating parties get the false-sharing, in a later patch we'll show how to avoid that too. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: move to workqueue gcNikolay Aleksandrov10-23/+29
Move the fdb garbage collector to a workqueue which fires at least 10 milliseconds apart and cleans chain by chain allowing for other tasks to run in the meantime. When having thousands of fdbs the system is much more responsive. Most importantly remove the need to check if the matched entry has expired in __br_fdb_get that causes false-sharing and is completely unnecessary if we cleanup entries, at worst we'll get 10ms of traffic for that entry before it gets deleted. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06bridge: modify bridge and port to have often accessed fields in one cache lineNikolay Aleksandrov1-23/+20
Move around net_bridge so the vlan fields are in the beginning since they're checked on every packet even if vlan filtering is disabled. For the port move flags & vlan group to the beginning, so they're in the same cache line with the port's state (both flags and state are checked on each packet). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>