aboutsummaryrefslogtreecommitdiffstats
path: root/net/bridge (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-01-09netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()Patrick McHardy1-1/+1
We don't encode argument types into function names and since besides nft_do_chain() there are only AF-specific versions, there is no risk of confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: minor nf_chain_type cleanupsPatrick McHardy1-2/+2
Minor nf_chain_type cleanups: - reorder struct to plug a hoe - rename struct module member to "owner" for consistency - rename nf_hookfn array to "hooks" for consistency - reorder initializers for better readability Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: constify chain type definitions and pointersPatrick McHardy1-1/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09netfilter: nf_tables: add missing module references to chain typesPatrick McHardy1-0/+1
In some cases we neither take a reference to the AF info nor to the chain type, allowing the module to be unloaded while in use. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07netfilter: nf_tables: add support for multi family tablesPatrick McHardy1-0/+1
Add support to register chains to multiple hooks for different address families for mixed IPv4/IPv6 tables. Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07netfilter: nf_tables: make chain types override the default AF functionsPatrick McHardy1-19/+19
Currently the AF-specific hook functions override the chain-type specific hook functions. That doesn't make too much sense since the chain types are a special case of the AF-specific hooks. Make the AF-specific hook functions the default and make the optional chain type hooks override them. As a side effect, the necessary code restructuring reduces the code size, f.i. in case of nf_tables_ipv4.o: nf_tables_ipv4_init_net | -24 nft_do_chain_ipv4 | -113 2 functions changed, 137 bytes removed, diff: -137 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+2
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06bridge: use DEVICE_ATTR_xx macrossfeldma@cumulusnetworks.com1-141/+108
Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06bridge: use spin_lock_bh() in br_multicast_set_hash_maxCurt Brune1-2/+2
br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04net: unify the pcpu_tstats and br_cpu_netstats as oneLi RongQing3-15/+7
They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01netlink: cleanup rntl_af_registerstephen hemminger1-4/+1
The function __rtnl_af_register is never called outside this code, and the return value is always 0. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19bridge: change the position of '{' to the pre linetanxiaojun4-18/+9
That open brace { should be on the previous line. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19bridge: change "foo* bar" to "foo *bar"tanxiaojun2-10/+10
"foo * bar" should be "foo *bar". Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19bridge: add space before '(/{', after ',', etc.tanxiaojun7-12/+12
Spaces required before the open parenthesis '(', before the open brace '{', after that ',' and around that '?/:'. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19bridge: remove unnecessary parenthesestanxiaojun2-3/+3
Return is not a function, parentheses are not required. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19bridge: remove unnecessary condition judgmenttanxiaojun2-4/+2
Because err is always negative, remove unnecessary condition judgment. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18bridge: spelling fixestanxiaojun3-3/+3
Fix spelling errors in bridge driver. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net: more spelling fixesstephen hemminger1-2/+2
Various spelling fixes in networking stack Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-1/+11
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06br: fix use of ->rx_handler_data in code executed on non-rx_handler pathJiri Pirko2-1/+11
br_stp_rcv() is reached by non-rx_handler path. That means there is no guarantee that dev is bridge port and therefore simple NULL check of ->rx_handler_data is not enough. There is need to check if dev is really bridge port and since only rcu read lock is held here, do it by checking ->rx_handler pointer. Note that synchronize_net() in netdev_rx_handler_unregister() ensures this approach as valid. Introduced originally by: commit f350a0a87374418635689471606454abc7beaa3a "bridge: use rx_handler_data pointer to store net_bridge_port pointer" Fixed but not in the best way by: commit b5ed54e94d324f17c97852296d61a143f01b227a "bridge: fix RCU races with bridge port" Reintroduced by: commit 716ec052d2280d511e10e90ad54a86f5b5d4dcc2 "bridge: fix NULL pointer deref of br_port_get_rcu" Please apply to stable trees as well. Thanks. RH bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1025770 Reported-by: Laine Stump <laine@redhat.com> Debugged-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06netfilter: Fix FSF address in file headersJeff Kirsher1-2/+1
Several files refer to an old address for the Free Software Foundation in the file header comment. Resolve by replacing the address with the URL <http://www.gnu.org/licenses/> so that we do not have to keep updating the header comments anytime the address changes. CC: netfilter@vger.kernel.org CC: Pablo Neira Ayuso <pablo@netfilter.org> CC: Patrick McHardy <kaber@trash.net> CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-3/+5
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains fixes for your net tree, they are: * Remove extra quote from connlimit configuration in Kconfig, from Randy Dunlap. * Fix missing mss option in syn packets sent to the backend in our new synproxy target, from Martin Topholm. * Use window scale announced by client when sending the forged syn to the backend, from Martin Topholm. * Fix IPv6 address comparison in ebtables, from Luís Fernando Cornachioni Estrozi. * Fix wrong endianess in sequence adjustment which breaks helpers in NAT configurations, from Phil Oester. * Fix the error path handling of nft_compat, from me. * Make sure the global conntrack counter is decremented after the object has been released, also from me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20bridge: flush br's address entry in fdb when remove theDing Tianhong1-0/+2
bridge dev When the following commands are executed: brctl addbr br0 ifconfig br0 hw ether <addr> rmmod bridge The calltrace will occur: [ 563.312114] device eth1 left promiscuous mode [ 563.312188] br0: port 1(eth1) entered disabled state [ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects [ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9 [ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8 [ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8 [ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78 [ 563.468209] Call Trace: [ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78 [ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100 [ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge] [ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge] [ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0 [ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b [ 570.377958] Bridge firewalling registered --------------------------- cut here ------------------------------- The reason is that when the bridge dev's address is changed, the br_fdb_change_mac_address() will add new address in fdb, but when the bridge was removed, the address entry in the fdb did not free, the bridge_fdb_cache still has objects when destroy the cache, Fix this by flushing the bridge address entry when removing the bridge. v2: according to the Toshiaki Makita and Vlad's suggestion, I only delete the vlan0 entry, it still have a leak here if the vlan id is other number, so I need to call fdb_delete_by_port(br, NULL, 1) to flush all entries whose dst is NULL for the bridge. Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-14/+11
Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
2013-11-19netfilter: ebt_ip6: fix source and destination matchingLuís Fernando Cornachioni Estrozi1-3/+5
This bug was introduced on commit 0898f99a2. This just recovers two checks that existed before as suggested by Bart De Schuymer. Signed-off-by: Luís Fernando Cornachioni Estrozi <lestrozi@uolinc.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-14bridge: Fix memory leak when deleting bridge with vlan filtering enabledToshiaki Makita1-0/+1
We currently don't call br_vlan_flush() when deleting a bridge, which leads to memory leak if br->vlan_info is allocated. Steps to reproduce: while : do brctl addbr br0 bridge vlan add dev br0 vid 10 self brctl delbr br0 done We can observe the cache size of corresponding slab entry (as kmalloc-2048 in SLUB) is increased. kmemleak output: unreferenced object 0xffff8800b68a7000 (size 2048): comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff .........H.6.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220 [<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge] [<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge] [<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200 [<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260 [<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0 [<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30 [<ffffffff81527ccd>] netlink_unicast+0xdd/0x190 [<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740 [<ffffffff814e8368>] sock_sendmsg+0x88/0xc0 [<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0 [<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90 [<ffffffff814e922e>] SyS_sendmsg+0xe/0x10 [<ffffffff81601669>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14bridge: Call vlan_vid_del for all vids at nbp_vlan_flushToshiaki Makita1-0/+4
We should call vlan_vid_del for all vids at nbp_vlan_flush to prevent vid_info->refcount from being leaked when detaching a bridge port. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14bridge: Use vlan_vid_[add/del] instead of direct ndo_vlan_rx_[add/kill]_vid callsToshiaki Makita1-14/+6
We should use wrapper functions vlan_vid_[add/del] instead of ndo_vlan_rx_[add/kill]_vid. Otherwise, we might be not able to communicate using vlan interface in a certain situation. Example of problematic case: vconfig add eth0 10 brctl addif br0 eth0 bridge vlan add dev eth0 vid 10 bridge vlan del dev eth0 vid 10 brctl delif br0 eth0 In this case, we cannot communicate via eth0.10 because vlan 10 is filtered by NIC that has the vlan filtering feature. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+7
Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...
2013-11-06net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz1-0/+7
In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftablesDavid S. Miller2-2/+40
Pablo Neira Ayuso says: ==================== This batch contains fives nf_tables patches for your net-next tree, they are: * Fix possible use after free in the module removal path of the x_tables compatibility layer, from Dan Carpenter. * Add filter chain type for the bridge family, from myself. * Fix Kconfig dependencies of the nf_tables bridge family with the core, from myself. * Fix sparse warnings in nft_nat, from Tomasz Bursztyka. * Remove duplicated include in the IPv4 family support for nf_tables, from Wei Yongjun. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-0/+2
Pablo Neira Ayuso says: ==================== This is another batch containing Netfilter/IPVS updates for your net-next tree, they are: * Six patches to make the ipt_CLUSTERIP target support netnamespace, from Gao feng. * Two cleanups for the nf_conntrack_acct infrastructure, introducing a new structure to encapsulate conntrack counters, from Holger Eitzenberger. * Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann. * Skip checksum recalculation in SCTP support for IPVS, also from Daniel Borkmann. * Fix behavioural change in xt_socket after IP early demux, from Florian Westphal. * Fix bogus large memory allocation in the bitmap port set type in ipset, from Jozsef Kadlecsik. * Fix possible compilation issues in the hash netnet set type in ipset, also from Jozsef Kadlecsik. * Define constants to identify netlink callback data in ipset dumps, again from Jozsef Kadlecsik. * Use sock_gen_put() in xt_socket to replace xt_socket_put_sk, from Eric Dumazet. * Improvements for the SH scheduler in IPVS, from Alexander Frolkin. * Remove extra delay due to unneeded rcu barrier in IPVS net namespace cleanup path, from Julian Anastasov. * Save some cycles in ip6t_REJECT by skipping checksum validation in packets leaving from our stack, from Stanislav Fomichev. * Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-35/+27
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29bridge: pass correct vlan id to multicast codeVlad Yasevich4-29/+25
Currently multicast code attempts to extrace the vlan id from the skb even when vlan filtering is disabled. This can lead to mdb entries being created with the wrong vlan id. Pass the already extracted vlan id to the multicast filtering code to make the correct id is used in creation as well as lookup. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28netfilter: bridge: nf_tables: add filter chain typePablo Neira Ayuso1-2/+39
This patch adds the filter chain type which is required to create filter chains in the bridge family from userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-28netfilter: bridge: fix nf_tables bridge dependencies with main corePablo Neira Ayuso1-0/+1
when CONFIG_NF_TABLES[_MODULE] is not enabled, but CONFIG_NF_TABLES_BRIDGE is enabled: net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_init_net': net/bridge/netfilter/nf_tables_bridge.c:24:5: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:25:9: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:28:2: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:30:34: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:35:11: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_exit_net': net/bridge/netfilter/nf_tables_bridge.c:41:27: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:42:11: error: 'struct net' has no member named 'nft' Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-27bridge: netfilter: orphan skb before invoking ip netfilter hooksFlorian Westphal1-0/+2
Pekka Pietikäinen reports xt_socket behavioural change after commit 00028aa37098o (netfilter: xt_socket: use IP early demux). Reason is xt_socket now no longer does an unconditional sk lookup - it re-uses existing skb->sk if possible, assuming ->sk was set by ip early demux. However, when netfilter is invoked via bridge, this can cause 'bogus' sockets to be examined by the match, e.g. a 'tun' device socket. bridge netfilter should orphan the skb just like the routing path before invoking ipv4/ipv6 netfilter hooks to avoid this. Reported-and-tested-by: Pekka Pietikäinen <pp@ee.oulu.fi> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-23Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-6/+3
Pablo Neira Ayuso says: ==================== The following patchset contains three netfilter fixes for your net tree, they are: * A couple of fixes to resolve info leak to userspace due to uninitialized memory area in ulogd, from Mathias Krause. * Fix instruction ordering issues that may lead to the access of uninitialized data in x_tables. The problem involves the table update (producer) and the main packet matching (consumer) routines. Detected in SMP ARMv7, from Will Deacon. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-79/+99
Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22Revert "bridge: only expire the mdb entry when query is received"Linus Lüssing3-22/+28
While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypesJoe Perches2-173/+150
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Fix updating FDB entries when the PVID is appliedToshiaki Makita1-0/+1
We currently set the value that variable vid is pointing, which will be used in FDB later, to 0 at br_allowed_ingress() when we receive untagged or priority-tagged frames, even though the PVID is valid. This leads to FDB updates in such a wrong way that they are learned with VID 0. Update the value to that of PVID if the PVID is applied. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Fix the way the PVID is referencedToshiaki Makita1-3/+1
We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Apply the PVID to priority-tagged framesToshiaki Makita1-7/+20
IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames use the PVID for the port as its VID. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Apply the PVID to not only untagged frames but also priority-tagged frames. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Don't use VID 0 and 4095 in vlan filteringToshiaki Makita3-54/+49
IEEE 802.1Q says that: - VID 0 shall not be configured as a PVID, or configured in any Filtering Database entry. - VID 4095 shall not be configured as a PVID, or transmitted in a tag header. This VID value may be used to indicate a wildcard match for the VID in management operations or Filtering Database entries. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Don't accept adding these VIDs in the vlan_filtering implementation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17bridge: Correctly clamp MAX forward_delay when enabling STPVlad Yasevich1-1/+1
Commit be4f154d5ef0ca147ab6bcd38857a774133f5450 bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-14netfilter: nf_tables: complete net namespace supportPablo Neira Ayuso1-2/+30
Register family per netnamespace to ensure that sets are only visible in its approapriate namespace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: add nftablesPatrick McHardy3-0/+42
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: pass hook ops to hookfnPatrick McHardy3-20/+34
Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-11bridge: update mdb expiration timer upon reports.Vlad Yasevich1-1/+8
commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b bridge: only expire the mdb entry when query is received changed the mdb expiration timer to be armed only when QUERY is received. Howerver, this causes issues in an environment where the multicast server socket comes and goes very fast while a client is trying to send traffic to it. The root cause is a race where a sequence of LEAVE followed by REPORT messages can race against QUERY messages generated in response to LEAVE. The QUERY ends up starting the expiration timer, and that timer can potentially expire after the new REPORT message has been received signaling the new join operation. This leads to a significant drop in multicast traffic and possible complete stall. The solution is to have REPORT messages update the expiration timer on entries that already exist. CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>