aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter (follow)
AgeCommit message (Collapse)AuthorFilesLines
2009-11-18Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-0/+1
Conflicts: drivers/net/sfc/sfe4001.c drivers/net/wireless/libertas/cmd.c drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/rtl8187se/Kconfig drivers/staging/rtl8192e/Kconfig
2009-11-18netns: net_identifiers should be read_mostlyEric Dumazet2-2/+2
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds3-44/+38
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) net/fsl_pq_mdio: add module license GPL can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo() can: should not use __dev_get_by_index() without locks hisax: remove bad udelay call to fix build error on ARM ipip: Fix handling of DF packets when pmtudisc is OFF qlge: Set PCIe reset type for EEH to fundamental. qlge: Fix early exit from mbox cmd complete wait. ixgbe: fix traffic hangs on Tx with ioatdma loaded ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled ixgbe: Fix gso_max_size for 82599 when DCB is enabled macsonic: fix crash on PowerBook 520 NET: cassini, fix lock imbalance ems_usb: Fix byte order issues on big endian machines be2net: Bug fix to send config commands to hardware after netdev_register be2net: fix to set proper flow control on resume netfilter: xt_connlimit: fix regression caused by zero family value rt2x00: Don't queue ieee80211 work after USB removal Revert "ipw2200: fix oops on missing firmware" decnet: netdevice refcount leak netfilter: nf_nat: fix NAT issue in 2.6.30.4+ ...
2009-11-08Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-6/+4
Conflicts: drivers/net/can/usb/ems_usb.c
2009-11-06netfilter: xt_connlimit: fix regression caused by zero family valueJan Engelhardt1-6/+4
Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all instances of par->match->family were changed to par->family. References: http://bugzilla.netfilter.org/show_bug.cgi?id=610 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller2-38/+34
Conflicts: drivers/net/usb/cdc_ether.c All CDC ethernet devices of type USB_CLASS_COMM need to use '&mbm_info'. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06netfilter: nf_nat: fix NAT issue in 2.6.30.4+Jozsef Kadlecsik2-38/+34
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work over NAT. The "cause" of the problem was a fix of unacknowledged data detection with NAT (commit a3a9f79e361e864f0e9d75ebe2a0cb43d17c4272). However, actually, that fix uncovered a long standing bug in TCP conntrack: when NAT was enabled, we simply updated the max of the right edge of the segments we have seen (td_end), by the offset NAT produced with changing IP/port in the data. However, we did not update the other parameter (td_maxend) which is affected by the NAT offset. Thus that could drift away from the correct value and thus resulted breaking active FTP. The patch below fixes the issue by *not* updating the conntrack parameters from NAT, but instead taking into account the NAT offsets in conntrack in a consistent way. (Updating from NAT would be more harder and expensive because it'd need to re-calculate parameters we already calculated in conntrack.) Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18inet: rename some inet_sock fieldsEric Dumazet1-1/+1
In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11headers: remove sched.h from interrupt.hAlexey Dobriyan1-0/+1
After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller1-2/+2
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan2-6/+6
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22mm: replace various uses of num_physpages by totalram_pagesJan Beulich3-7/+7
Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-10Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6David S. Miller18-869/+136
2009-09-02Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller2-2/+2
Conflicts: drivers/net/yellowfin.c
2009-08-31IPVS: Add handling of incoming ICMPV6 messagesJulius Volz1-6/+17
Add handling of incoming ICMPv6 messages. This follows the handling of IPv4 ICMP messages. Amongst ther things this problem allows IPVS to behave sensibly when an ICMPV6_PKT_TOOBIG message is received: This message is received when a realserver sends a packet >PMTU to the client. The hop on this path with insufficient MTU will generate an ICMPv6 Packet Too Big message back to the VIP. The LVS server receives this message, but the call to the function handling this has been missing. Thus, IPVS fails to forward the message to the real server, which then does not adjust the path MTU. This patch adds the missing call to ip_vs_in_icmp_v6() in ip_vs_in() to handle this situation. Thanks to Rob Gallagher from HEAnet for reporting this issue and for testing this patch in production (with direct routing mode). [horms@verge.net.au: tweaked changelog] Signed-off-by: Julius Volz <julius.volz@gmail.com> Tested-by: Rob Gallagher <robert.gallagher@heanet.ie> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31netfilter: nf_conntrack: netns fix re reliable conntrack event deliveryAlexey Dobriyan1-3/+3
Conntracks in netns other than init_net dying list were never killed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31ipvs: Use atomic operations atomiclySimon Horman2-6/+7
A pointed out by Shin Hong, IPVS doesn't always use atomic operations in an atomic manner. While this seems unlikely to be manifest in strange behaviour, it seems appropriate to clean this up. Cc: shin hong <hongshin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25netfilter: nfnetlink: constify message attributes and headersPatrick McHardy6-30/+49
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24netfilter: xtables: mark initial tables constantJan Engelhardt1-3/+4
The inputted table is never modified, so should be considered const. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-23netfilter: xt_quota: fix wrong return value (error case)Patrick McHardy1-1/+1
Success was indicated on a memory allocation failure, thereby causing a crash due to a later NULL deref. (Affects v2.6.30-rc1 up to here.) Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-17net: restore gnet_stats_basic to previous definitionEric Dumazet1-1/+1
In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc for better SMP performance" the definition of struct gnet_stats_basic changed incompatibly, as copies of this struct are shipped to userland via netlink. Restoring old behavior is not welcome, for performance reason. Fix is to use a private structure for kernel, and teach gnet_stats_copy_basic() to convert from kernel to user land, using legacy structure (struct gnet_stats_basic) Based on a report and initial patch from Michael Spang. Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-10netfilter: xtables: remove xt_owner v0Jan Engelhardt1-118/+12
Superseded by xt_owner v1 (v2.6.24-2388-g0265ab4). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_mark v0Jan Engelhardt1-76/+10
Superseded by xt_mark v1 (v2.6.24-2922-g17b0d7e). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_iprange v0Jan Engelhardt1-43/+2
Superseded by xt_iprange v1 (v2.6.24-2928-g1a50c5a1). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_conntrack v0Jan Engelhardt1-154/+1
Superseded by xt_conntrack v1 (v2.6.24-2921-g64eb12f). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_connmark v0Jan Engelhardt1-90/+11
Superseded by xt_connmark v1 (v2.6.24-2919-g96e3227). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_MARK v0, v1Jan Engelhardt1-154/+9
Superseded by xt_MARK v2 (v2.6.24-2918-ge0a812a). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_CONNMARK v0Jan Engelhardt1-123/+11
Superseded by xt_CONNMARK v1 (v2.6.24-2917-g0dc8c76). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_TOS v0Jan Engelhardt2-63/+0
Superseded by xt_TOS v1 (v2.6.24-2396-g5c350e5). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-05net: mark read-only arrays as constJan Engelhardt3-3/+4
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02IPVS: use pr_err and friends instead of IP_VS_ERR and friendsHannes Eder20-147/+145
Since pr_err and friends are used instead of printk there is no point in keeping IP_VS_ERR and friends. Furthermore make use of '__func__' instead of hard coded function names. Signed-off-by: Hannes Eder <heder@google.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30IPVS: use pr_fmtHannes Eder23-4/+74
While being at it cleanup whitespace. Signed-off-by: Hannes Eder <heder@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-16Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller2-5/+21
Conflicts: drivers/net/wireless/orinoco/main.c
2009-07-16netfilter: nf_conntrack: nf_conntrack_alloc() fixesEric Dumazet1-3/+18
When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating objects, since slab allocator could give a freed object still used by lockless readers. In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next being always valid (ie containing a valid 'nulls' value, or a valid pointer to next object in hash chain.) kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid for ct->tuplehash[xxx].hnnode.next. Fix is to call kmem_cache_alloc() and do the zeroing ourself. As spotted by Patrick, we also need to make sure lookup keys are committed to memory before setting refcount to 1, or a lockless reader could get a reference on the old version of the object. Its key re-check could then pass the barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-07-16netfilter: xt_osf: fix nf_log_packet() argumentsPatrick McHardy1-2/+3
The first argument is the address family, the second one the hook number. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-07-12genetlink: make netns awareJohannes Berg1-1/+1
This makes generic netlink network namespace aware. No generic netlink families except for the controller family are made namespace aware, they need to be checked one by one and then set the family->netnsok member to true. A new function genlmsg_multicast_netns() is introduced to allow sending a multicast message in a given namespace, for example when it applies to an object that lives in that namespace, a new function genlmsg_multicast_allns() to send a message to all network namespaces (for objects that do not have an associated netns). The function genlmsg_multicast() is changed to multicast the message in just init_net, which is currently correct for all generic netlink families since they only work in init_net right now. Some will later want to work in all net namespaces because they do not care about the netns at all -- those will have to be converted to use one of the new functions genlmsg_multicast_allns() or genlmsg_multicast_netns() whenever they are made netns aware in some way. After this patch families can easily decide whether or not they should be available in all net namespaces. Many genl families us it for objects not related to networking and should therefore be available in all namespaces, but that will have to be done on a per family basis. Note that this doesn't touch on the checkpoint/restart problem where network namespaces could be used, genl families and multicast groups are numbered globally and I see no easy way of changing that, especially since it must be possible to multicast to all network namespaces for those families that do not care about netns. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-29netfilter: xtables: conntrack match revision 2Jan Engelhardt1-6/+60
As reported by Philip, the UNTRACKED state bit does not fit within the 8-bit state_mask member. Enlarge state_mask and give status_mask a few more bits too. Reported-by: Philip Craig <philipc@snapgear.com> References: http://markmail.org/thread/b7eg6aovfh4agyz7 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-29netfilter: tcp conntrack: fix unacknowledged data detection with NATPatrick McHardy1-3/+3
When NAT helpers change the TCP packet size, the highest seen sequence number needs to be corrected. This is currently only done upwards, when the packet size is reduced the sequence number is unchanged. This causes TCP conntrack to falsely detect unacknowledged data and decrease the timeout. Fix by updating the highest seen sequence number in both directions after packet mangling. Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-25nf_conntrack: Use rcu_barrier()Jesper Dangaard Brouer2-2/+4
RCU barriers, rcu_barrier(), is inserted two places. In nf_conntrack_expect.c nf_conntrack_expect_fini() before the kmem_cache_destroy(). Firstly to make sure the callback to the nf_ct_expect_free_rcu() code is still around. Secondly because I'm unsure about the consequence of having in flight nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a kmem_cache_destroy() slab destroy. And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is invoked by __nf_ct_ext_add(). It might be more efficient to call rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but thats make it more difficult to read the code (as the callback code in located in nf_conntrack_extend.c). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: xt_rateest: fix comparison with selfPatrick McHardy1-1/+1
As noticed by Török Edwin <edwintorok@gmail.com>: Compiling the kernel with clang has shown this warning: net/netfilter/xt_rateest.c:69:16: warning: self-comparison always results in a constant value ret &= pps2 == pps2; ^ Looking at the code: if (info->flags & XT_RATEEST_MATCH_BPS) ret &= bps1 == bps2; if (info->flags & XT_RATEEST_MATCH_PPS) ret &= pps2 == pps2; Judging from the MATCH_BPS case it seems to be a typo, with the intention of comparing pps1 with pps2. http://bugzilla.kernel.org/show_bug.cgi?id=13535 Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: xt_quota: fix incomplete initializationJan Engelhardt1-0/+1
Commit v2.6.29-rc5-872-gacc738f ("xtables: avoid pointer to self") forgot to copy the initial quota value supplied by iptables into the private structure, thus counting from whatever was in the memory kmalloc returned. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: nf_log: fix direct userspace memory access in proc handlerPatrick McHardy1-5/+11
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: fix some sparse endianess warningsPatrick McHardy2-8/+8
net/netfilter/xt_NFQUEUE.c:46:9: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:46:9: expected unsigned int [unsigned] [usertype] ipaddr net/netfilter/xt_NFQUEUE.c:46:9: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:68:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:68:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:68:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:69:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:69:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:69:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:70:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:70:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:70:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:71:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:71:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:71:10: got restricted unsigned int net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: nf_conntrack: fix conntrack lookup racePatrick McHardy1-2/+4
The RCU protected conntrack hash lookup only checks whether the entry has a refcount of zero to decide whether it is stale. This is not sufficient, entries are explicitly removed while there is at least one reference left, possibly more. Explicitly check whether the entry has been marked as dying to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: nf_conntrack: fix confirmation race conditionPatrick McHardy1-1/+8
New connection tracking entries are inserted into the hash before they are fully set up, namely the CONFIRMED bit is not set and the timer not started yet. This can theoretically lead to a race with timer, which would set the timeout value to a relative value, most likely already in the past. Perform hash insertion as the final step to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22netfilter: nf_conntrack: death_by_timeout() fixEric Dumazet1-2/+8
death_by_timeout() might delete a conntrack from hash list and insert it in dying list. nf_ct_delete_from_lists(ct); nf_ct_insert_dying_list(ct); I believe a (lockless) reader could *catch* ct while doing a lookup and miss the end of its chain. (nulls lookup algo must check the null value at the end of lookup and should restart if the null value is not the expected one. cf Documentation/RCU/rculist_nulls.txt for details) We need to change nf_conntrack_init_net() and use a different "null" value, guaranteed not being used in regular lists. Choose very large values, since hash table uses [0..size-1] null values. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-15Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6David S. Miller2-3/+3
Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-13x_tables: Convert printk to pr_errJoe Perches1-8/+8
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13netfilter: conntrack: optional reliable conntrack event deliveryPablo Neira Ayuso3-18/+118
This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()Pablo Neira Ayuso2-10/+15
This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>