linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-03-24	ipv4: remove ip_rt_dump from route.c	Li RongQing	2	-6/+1
	ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24	ipv4: remove ipv4_ifdown_dst from route.c	Li RongQing	1	-6/+0
	ipv4_ifdown_dst does nothing after IPv4 route caches removal, so we can remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-22	batman-adv: Start new development cycle	Simon Wunderlich	1	-1/+1
	Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: improve DAT documentation	Antonio Quartulli	2	-2/+6
	Add missing documentation for BATADV_DAT_ADDR_MAX and convert an existing documentation to kerneldoc Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22	batman-adv: improve the TT flags documentation	Antonio Quartulli	1	-4/+24
	Convert the current documentation for the TT flags in proper kerneldoc and improve it by adding an explanation for each of the flags. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22	batman-adv: Send multicast packets to nodes with a WANT_ALL flag	Linus Lüssing	6	-3/+198
	With this patch a node sends IPv4 multicast packets to nodes which have a BATADV_MCAST_WANT_ALL_IPV4 flag set and IPv6 multicast packets to nodes which have a BATADV_MCAST_WANT_ALL_IPV6 flag set, too. Why is this needed? There are scenarios involving bridges where multicast report snooping and multicast TT announcements are not sufficient, which would lead to packet loss for some nodes otherwise: MLDv1 and IGMPv1/IGMPv2 have a suppression mechanism for multicast listener reports. When we have an MLDv1/IGMPv1/IGMPv2 querier behind a bridge then our snooping bridge is potentially not going to see any reports even though listeners exist because according to RFC4541 such reports are only forwarded to multicast routers: ----------------------------------------------------------- --------------- {Querier}---\|Snoop. Switch\|----{Listener} --------------- \ ^ ------- \| br0 \| < ??? ------- \ _-~---~_ _-~/ ~-_ ~ batman-adv \-----{Sender} \~_ cloud ~/ -~~__-__-~_/ I) MLDv1 Query: {Querier} -> flooded II) MLDv1 Report: {Listener} -> {Querier} -> br0 cannot detect the {Listener} => Packets from {Sender} need to be forwarded to all detected listeners and MLDv1/IGMPv1/IGMPv2 queriers. ----------------------------------------------------------- Note that we do not need to explicitly forward to MLDv2/IGMPv3 queriers, because these protocols have no report suppression: A bridge has no trouble detecting MLDv2/IGMPv3 listeners. Even though we do not support bridges yet we need to provide the according infrastructure already to not break compatibility later. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support	Linus Lüssing	6	-8/+156
	With this patch a node may additionally perform the dropping or unicasting behaviour for a link-local IPv4 and link-local-all-nodes IPv6 multicast packet, too. The extra counter and BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag is needed because with a future bridge snooping support integration a node with a bridge on top of its soft interface is not able to reliably detect its multicast listeners for IPv4 link-local and the IPv6 link-local-all-nodes addresses anymore (see RFC4541, section 2.1.2.2 and section 3). Even though this new flag does make "no difference" now, it'll ensure a seamless integration of multicast bridge support without needing to break compatibility later. Also note, that even with multicast bridge support it won't be possible to optimize 224.0.0.x and ff02::1 towards nodes with bridges, they will always receive these ranges. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: Modified forwarding behaviour for multicast packets	Linus Lüssing	9	-26/+266
	With this patch a multicast packet is not always simply flooded anymore, the behaviour for the following cases is changed to reduce unnecessary overhead: If all nodes within the horizon of a certain node have signalized multicast listener announcement capability then an IPv6 multicast packet with a destination of IPv6 link-local scope (excluding ff02::1) coming from the upstream of this node... * ...is dropped if there is no according multicast listener in the translation table, * ...is forwarded via unicast if there is a single node with interested multicast listeners * ...and otherwise still gets flooded. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: Announce new capability via multicast TVLV	Linus Lüssing	7	-6/+167
	If the soft interface of a node is not part of a bridge then a node announces a new multicast TVLV: The existence of this TVLV signalizes that this node is announcing all of its multicast listeners via the translation table infrastructure. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: introduce capability initialization bitfield	Linus Lüssing	3	-9/+11
	The new bitfield allows us to keep track whether capability subsets of an originator have gone through their initialization phase yet. The translation table is the only user right now, but a new one will be added soon. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: Multicast Listener Announcements via Translation Table	Linus Lüssing	8	-4/+318
	With this patch a node which has no bridge interface on top of its soft interface announces its local multicast listeners via the translation table. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: add kerneldoc for dst_hint argument	Antonio Quartulli	2	-0/+3
	Some helper functions used along the TX path have now a new "dst_hint" argument but the kerneldoc was missing. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22	batman-adv: call unregister_netdev() to have it handle the locking for us	Marek Lindner	1	-4/+1
	Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: fix a few kerneldoc inconsistencies	Simon Wunderlich	1	-5/+5
	Reported-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: prefer ether_addr_copy to memcpy	Antonio Quartulli	13	-71/+70
	On some architectures ether_addr_copy() is slightly faster than memcpy() therefore use the former when possible. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22	batman-adv: remove obsolete skb_reset_mac_header() in batadv_bla_tx()	Linus Lüssing	1	-3/+0
	Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having the skb mac header pointer set correctly since the following commit present in kernels >= 3.9: "net: reset mac header in dev_start_xmit()" (6d1ccff627) Therefore this commit removes the according, now redundant, skb_reset_mac_header() call in batadv_bla_tx(). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: use vlan_/eth_hdr() instead of skb->data in interface_tx path	Linus Lüssing	5	-14/+18
	Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having the skb mac header pointer set correctly since the following commit present in kernels >= 3.9: "net: reset mac header in dev_start_xmit()" (6d1ccff627) Therefore we can safely use eth_hdr() and vlan_eth_hdr() instead of skb->data now, which spares us some ugly type casts. At the same time set the mac_header in batadv_dat_snoop_incoming_arp_request() before sending the skb along the TX path. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22	batman-adv: fix coccinelle warnings	Fengguang Wu	1	-1/+1
	net/batman-adv/network-coding.c:1535:1-7: Replace memcpy with struct assignment Generated by: coccinelle/misc/memcpy-assign.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-20	ieee802154: dgram: cleanup set of broadcast panid	Alexander Aring	1	-1/+1
	This patch is only a cleanup to use the right define for a panid field. The broadcast address and panid broadcast is still the same value. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	af_ieee802154: fix check on broadcast address	Alexander Aring	1	-1/+1
	This patch fixes an issue which was introduced by commit b70ab2e87f17176d18f67ef331064441a032b5f3 ("ieee802154: enforce consistent endianness in the 802.15.4 stack"). The correct behaviour should be a check on the broadcast address field which is 0xffff. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reported-by: Jan Luebbe <jlu@pengutronix.de> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	net: remove empty lines from tcp_syn_flood_action	Daniel Baluta	1	-2/+0
	Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	af_iucv: recvmsg problem for SOCK_STREAM sockets	Ursula Braun	1	-0/+1
	Commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 introduced a problem for SOCK_STREAM sockets, when only part of the incoming iucv message is received by user space. In this case the remaining data of the iucv message is lost. This patch makes sure an incompletely received iucv message is queued back to the receive queue. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reported-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18	netfilter: Add missing vmalloc.h include to nft_hash.c	David S. Miller	1	-0/+1
	Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18	ieee802154: properly unshare skbs in ieee802154 *_rcv functions	Phoebe Buckheister	3	-16/+21
	ieee802154 sockets do not properly unshare received skbs, which leads to panics (at least) when they are used in conjunction with 6lowpan, so run skb_share_check on received skbs. 6lowpan also contains a use-after-free, which is trivially fixed by replacing the inlined skb_share_check with the explicit call. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Tested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18	net: sched: use no more than one page in struct fw_head	Eric Dumazet	1	-26/+7
	In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark classifier") Patrick added an u32 field in fw_head, making it slightly bigger than one page. Lets use 256 slots to make fw_hash() more straight forward, and move @mask to the beginning of the structure as we often use a small number of skb->mark. @mask and first hash buckets share the same cache line. This brings back the memory usage to less than 4000 bytes, and permits John to add a rcu_head at the end of the structure later without any worry. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next	David S. Miller	14	-187/+638
	Steffen Klassert says: ==================== One patch to rename a newly introduced struct. The rest is the rework of the IPsec virtual tunnel interface for ipv6 to support inter address family tunneling and namespace crossing. 1) Rename the newly introduced struct xfrm_filter to avoid a conflict with iproute2. From Nicolas Dichtel. 2) Introduce xfrm_input_afinfo to access the address family dependent tunnel callback functions properly. 3) Add and use a IPsec protocol multiplexer for ipv6. 4) Remove dst_entry caching. vti can lookup multiple different dst entries, dependent of the configured xfrm states. Therefore it does not make to cache a dst_entry. 5) Remove caching of flow informations. vti6 does not use the the tunnel endpoint addresses to do route and xfrm lookups. 6) Update the vti6 to use its own receive hook. 7) Remove the now unused xfrm_tunnel_notifier. This was used from vti and is replaced by the IPsec protocol multiplexer hooks. 8) Support inter address family tunneling for vti6. 9) Check if the tunnel endpoints of the xfrm state and the vti interface are matching and return an error otherwise. 10) Enable namespace crossing for vti devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netfilter: conntrack: Fix UP builds	Eric Dumazet	1	-1/+1
	ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an empty structure. Replace it by CONNTRACK_LOCKS Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Remove dead packet receive code (CONFIG_NETPOLL_TRAP)	Eric W. Biederman	2	-529/+2
	The netpoll packet receive code only becomes active if the netpoll rx_skb_hook is implemented, and there is not a single implementation of the netpoll rx_skb_hook in the kernel. All of the out of tree implementations I have found all call netpoll_poll which was removed from the kernel in 2011, so this change should not add any additional breakage. There are problems with the netpoll packet receive code. __netpoll_rx does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq context. netpoll_neigh_reply leaks every skb it receives. Reception of packets does not work successfully on stacked devices (aka bonding, team, bridge, and vlans). Given that the netpoll packet receive code is buggy, there are no out of tree users that will be merged soon, and the code has not been used for in tree for a decade let's just remove it. Reverting this commit can server as a starting point for anyone who wants to resurrect netpoll packet reception support. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Move all receive processing under CONFIG_NETPOLL_TRAP	Eric W. Biederman	1	-17/+64
	Make rx_skb_hook, and rx in struct netpoll depend on CONFIG_NETPOLL_TRAP Make rx_lock, rx_np, and neigh_tx in struct netpoll_info depend on CONFIG_NETPOLL_TRAP Make the functions netpoll_rx_on, netpoll_rx, and netpoll_receive_skb no-ops when CONFIG_NETPOLL_TRAP is not set. Only build netpoll_neigh_reply, checksum_udp service_neigh_queue, pkt_is_ns, and __netpoll_rx when CONFIG_NETPOLL_TRAP is defined. Add helper functions netpoll_trap_setup, netpoll_trap_setup_info, netpoll_trap_cleanup, and netpoll_trap_cleanup_info that initialize and cleanup the struct netpoll and struct netpoll_info receive specific fields when CONFIG_NETPOLL_TRAP is enabled and do nothing otherwise. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Consolidate neigh_tx processing in service_neigh_queue	Eric W. Biederman	1	-22/+16
	Move the bond slave device neigh_tx handling into service_neigh_queue. In connection with neigh_tx processing remove unnecessary tests of a NULL netpoll_info. As the netpoll_poll_dev has already used and thus verified the existince of the netpoll_info. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Move netpoll_trap under CONFIG_NETPOLL_TRAP	Eric W. Biederman	1	-5/+9
	Now that we no longer need to receive packets to safely drain the network drivers receive queue move netpoll_trap and netpoll_set_trap under CONFIG_NETPOLL_TRAP Making netpoll_trap and netpoll_set_trap noop inline functions when CONFIG_NETPOLL_TRAP is not set. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Don't drop all received packets.	Eric W. Biederman	1	-11/+6
	Change the strategy of netpoll from dropping all packets received during netpoll_poll_dev to calling napi poll with a budget of 0 (to avoid processing drivers rx queue), and to ignore packets received with netif_rx (those will safely be placed on the backlog queue). All of the netpoll supporting drivers have been reviewed to ensure either thay use netif_rx or that a budget of 0 is supported by their napi poll routine and that a budget of 0 will not process the drivers rx queues. Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed. npinfo->rx_flags is removed as rx_flags with just the NETPOLL_RX_ENABLED flag becomes just a redundant mirror of list_empty(&npinfo->rx_np). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Add netpoll_rx_processing	Eric W. Biederman	1	-2/+2
	Add a helper netpoll_rx_processing that reports when netpoll has receive side processing to perform. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Warn if more packets are processed than are budgeted	Eric W. Biederman	1	-0/+1
	There is already a warning for this case in the normal netpoll path, but put a copy here in case how netpoll calls the poll functions causes a differenet result. netpoll will shortly call the napi poll routine with a budget 0 to avoid any rx packets being processed. As nothing does that today we may encounter drivers that have problems so a netpoll specific warning seems desirable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Visit all napi handlers in poll_napi	Eric W. Biederman	1	-3/+0
	In poll_napi loop through all of the napi handlers even when the budget falls to 0 to ensure that we process all of the tx_queues, and so that we continue to call into drivers when our initial budget is 0. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: Pass budget into poll_napi	Eric W. Biederman	1	-3/+3
	This moves the control logic to the top level in netpoll_poll_dev instead of having it dispersed throughout netpoll_poll_dev, poll_napi and poll_one_napi. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netpoll: move setting of NETPOLL_RX_DROP into netpoll_poll_dev	Eric W. Biederman	1	-8/+8
	Today netpoll depends on setting NETPOLL_RX_DROP before networking drivers receive packets in interrupt context so that the packets can be dropped. Move this setting into netpoll_poll_dev from poll_one_napi so that if ndo_poll_controller happens to receive packets we will drop the packets on the floor instead of letting the packets bounce through the networking stack and potentially cause problems. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller	37	-452/+1431
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: * cleanup to remove double semicolon from stephen hemminger. * calm down sparse warning in xt_ipcomp, from Fan Du. * nf_ct_labels support for nf_tables, from Florian Westphal. * new macros to simplify rcu dereferences in the scope of nfnetlink and nf_tables, from Patrick McHardy. * Accept queue and drop (including reason for drop) to verdict parsing in nf_tables, also from Patrick. * Remove unused random seed initialization in nfnetlink_log, from Florian Westphal. * Allow to attach user-specific information to nf_tables rules, useful to attach user comments to rule, from me. * Return errors in ipset according to the manpage documentation, from Jozsef Kadlecsik. * Fix coccinelle warnings related to incorrect bool type usage for ipset, from Fengguang Wu. * Add hash:ip,mark set type to ipset, from Vytas Dauksa. * Fix message for each spotted by ipset for each netns that is created, from Ilia Mirkin. * Add forceadd option to ipset, which evicts a random entry from the set if it becomes full, from Josh Hunt. * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu. * Improve conntrack scalability by removing a central spinlock, original work from Eric Dumazet. Jesper Dangaard Brouer took them over to address remaining issues. Several patches to prepare this change come in first place. * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization on element removal, etc. from Patrick McHardy. * Restore context in the rule deletion path, as we now release rule objects synchronously, from Patrick McHardy. This gets back event notification for anonymous sets. * Fix NAT family validation in nft_nat, also from Patrick. * Improve scalability of xt_connlimit by using an array of spinlocks and by introducing a rb-tree of hashtables for faster lookup of accounted objects per network. This patch was preceded by several patches and refactorizations to accomodate this change including the use of kmem_cache, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17	netfilter: connlimit: use rbtree for per-host conntrack obj storage	Florian Westphal	1	-47/+177
	With current match design every invocation of the connlimit_match function means we have to perform (number_of_conntracks % 256) lookups in the conntrack table [ to perform GC/delete stale entries ]. This is also the reason why ____nf_conntrack_find() in perf top has > 20% cpu time per core. This patch changes the storage to rbtree which cuts down the number of ct objects that need testing. When looking up a new tuple, we only test the connections of the host objects we visit while searching for the wanted host/network (or the leaf we need to insert at). The slot count is reduced to 32. Increasing slot count doesn't speed up things much because of rbtree nature. before patch (50kpps rx, 10kpps tx): + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw after (90kpps, 51kpps tx): + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw + 2.36% swapper [xt_connlimit] [k] count_tree Obvious disadvantages to previous version are the increase in code complexity and the increased memory cost. Partially based on Eric Dumazets fq scheduler. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-17	netfilter: connlimit: make same_source_net signed	Florian Westphal	1	-4/+5
	currently returns 1 if they're the same. Make it work like mem/strcmp so it can be used as rbtree search function. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-17	netfilter: connlimit: use keyed locks	Florian Westphal	1	-8/+18
	connlimit currently suffers from spinlock contention, example for 4-core system with rps enabled: + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw May allow parallel lookup/insert/delete if the entry is hashed to another slot. With patch: + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock Improved rx processing rate from ~35kpps to ~50 kpps. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-14	net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq	Eric W. Biederman	8	-16/+16
	Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	28	-121/+199
	Conflicts: drivers/net/usb/r8152.c drivers/net/xen-netback/netback.c Both the r8152 and netback conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	ieee802154: add proper length checks to header creations	Phoebe Buckheister	3	-2/+5
	Have mac802154 header_ops.create fail with -EMSGSIZE if the length passed will be too large to fit a frame. Since 6lowpan will ensure that no packet payload will be too large, pass a length of 0 there. 802.15.4 dgram sockets will also return -EMSGSIZE on payloads larger than the device MTU instead of -EINVAL. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	6lowpan: move lowpan frag_info out of 802.15.4 headers	Phoebe Buckheister	1	-9/+18
	Fragmentation and reassembly information for 6lowpan is independent from the 802.15.4 stack and used only by the 6lowpan reassembly process. Move the ieee802154_frag_info struct to a private are, it needn't be in the 802.15.4 skb control block. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	ieee802154: use ieee802154_addr instead of *_sa variants	Phoebe Buckheister	7	-123/+133
	Change all internal uses of ieee802154_addr_sa to ieee802154_addr, except for those instances that communicate directly with userspace. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	mac802154: use header operations to create/parse headers	Phoebe Buckheister	7	-296/+152
	Use the operations on 802.15.4 header structs introduced in a previous patch to create and parse all headers in the mac802154 stack. This patch reduces code duplication between different parts of the mac802154 stack that needed information from headers, and also fixes a few bugs that seem to have gone unnoticed until now: * 802.15.4 dgram sockets would return a slightly incorrect value for the SIOCINQ ioctl * mac802154 would not drop frames with the "security enabled" bit set, even though it does not support security, in violation of the standard Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	ieee802154: add header structs with endiannes and operations	Phoebe Buckheister	2	-1/+289
	This patch provides a set of structures to represent 802.15.4 MAC headers, and a set of operations to push/pull/peek these structs from skbs. We cannot simply pointer-cast the skb MAC header pointer to these structs, because 802.15.4 headers are wildly variable - depending on the first three bytes, virtually all other fields of the header may be present or not, and be present with different lengths. The new header creation/parsing routines also support 802.15.4 security headers, which are currently not supported by the mac802154 implementation of the protocol. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	ieee802154: enforce consistent endianness in the 802.15.4 stack	Phoebe Buckheister	11	-59/+84
	Enable sparse warnings about endianness, replace the remaining fields regarding network operations without explicit endianness annotations with such that are annotated, and propagate this through the entire stack. Uses of ieee802154_addr_sa are not changed yet, this patch is only concerned with all other fields (such as address filters, operation parameters and the likes). Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14	ieee802154: rename struct ieee802154_addr to *_sa	Phoebe Buckheister	9	-30/+31
	The struct as currently defined uses host byte order for some fields, and most big endian/EUI display byte order for other fields. Inside the stack, endianness should ideally match network byte order where possible to minimize the number of byteswaps done in critical paths, but this patch does not address this; it is only preparatory. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>