aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-02-03netfilter: xt_hashlimit: fix race condition and simplify lockingPatrick McHardy1-34/+18
As noticed by Shin Hong <hongshin@gmail.com>, there is a race between htable_find_get() and htable_put(): htable_put(): htable_find_get(): spin_lock_bh(&hashlimit_lock); <search entry> atomic_dec_and_test(&hinfo->use) atomic_inc(&hinfo->use) spin_unlock_bh(&hashlimit_lock) return hinfo; spin_lock_bh(&hashlimit_lock); hlist_del(&hinfo->node); spin_unlock_bh(&hashlimit_lock); htable_destroy(hinfo); The entire locking concept is overly complicated, tables are only created/referenced and released in process context, so a single mutex works just fine. Remove the hashinfo_spinlock and atomic reference count and use the mutex to protect table lookups/creation and reference count changes. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-02Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6David S. Miller2-2/+3
2010-02-02netfilter: xt_TCPMSS: SYN packets are allowed to contain dataSimon Arlott1-10/+8
The TCPMSS target is dropping SYN packets where: 1) There is data, or 2) The data offset makes the TCP header larger than the packet. Both of these result in an error level printk. This printk has been removed. This change avoids dropping SYN packets containing data. If there is also no MSS option (as well as data), one will not be added because of possible complications due to the increased packet size. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-26netfilter: ctnetlink: fix expectation mask dumpPatrick McHardy1-1/+2
The protocol number is not initialized, so userspace can't interpret the layer 4 data properly. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-19netfilter: nf_conntrack_sip: fix off-by-one in compact header parsingPatrick McHardy1-1/+1
In a string like "v:SIP/2.0..." it was checking for !isalpha('S') when it meant to be inspecting the ':'. Patch by Greg Alexander <greqcs@galexander.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18netfilter: nfnetlink_queue: simplify warning messageEric Leblond1-2/+2
This patch remove variable part from a debug message to have message concatenation from syslog. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18netfilter: xt_hashlimit: netns supportAlexey Dobriyan1-43/+98
Make hashtable per-netns. Make proc files per-netns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18netfilter: xt_recent: netns supportAlexey Dobriyan1-41/+95
Make recent table list per-netns. Make proc files per-netns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18netfilter: xt_hashlimit: simplify seqfile codeAlexey Dobriyan1-9/+5
Simply pass hashtable to seqfile iterators, proc entry itself is not needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18netfilter: xt_connlimit: netns supportAlexey Dobriyan1-3/+5
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13netfilter: ctnetlink: netns supportAlexey Dobriyan1-26/+39
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13netfilter: nfnetlink: netns supportAlexey Dobriyan4-31/+52
Make nfnl socket per-petns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds4-25/+25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits) sky2: Fix oops in sky2_xmit_frame() after TX timeout Documentation/3c509: document ethtool support af_packet: Don't use skb after dev_queue_xmit() vxge: use pci_dma_mapping_error to test return value netfilter: ebtables: enforce CAP_NET_ADMIN e1000e: fix and commonize code for setting the receive address registers e1000e: e1000e_enable_tx_pkt_filtering() returns wrong value e1000e: perform 10/100 adaptive IFS only on parts that support it e1000e: don't accumulate PHY statistics on PHY read failure e1000e: call pci_save_state() after pci_restore_state() netxen: update version to 4.0.72 netxen: fix set mac addr netxen: fix smatch warning netxen: fix tx ring memory leak tcp: update the netstamp_needed counter when cloning sockets TI DaVinci EMAC: Handle emac module clock correctly. dmfe/tulip: Let dmfe handle DM910x except for SPARC on-board chips ixgbe: Fix compiler warning about variable being used uninitialized netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq() mv643xx_eth: don't include cache padding in rx desc buffer size ... Fix trivial conflict in drivers/scsi/cxgb3i/cxgb3i_offload.c
2010-01-11netfilter: xt_osf: change %pi4 to %pI4Joe Perches1-2/+2
commit 8a27f7c90ffcb791eed7574922b51fb60b08fc89 changed the output style of %pi4 to use fixed width leading zero IP addresses "001.002.003.004". It's useful when printing multiple lines of addresses, but was a change in output style for some existing uses. Using %pI4 restores the previous output style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-11ipvs: use standardized format in sprintfJoe Perches1-1/+1
Use the same format string as net/ipv4/netfilter/nf_nat_ftp.c to encode an ipv4 address and port. Both uses should be a single common function. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-07netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()Patrick McHardy1-9/+9
As noticed by Dan Carpenter <error27@gmail.com>, update_nl_seq() currently contains an out of bounds read of the seq_aft_nl array when looking for the oldest sequence number position. Fix it to only compare valid positions. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-05IPVS: Allow boot time change of hash sizeCatalin(ux) M. BOIE3-15/+39
I was very frustrated about the fact that I have to recompile the kernel to change the hash size. So, I created this patch. If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel command line, or, if you built IPVS as modules, you can add options ip_vs conn_tab_bits=??. To keep everything backward compatible, you still can select the size at compile time, and that will be used as default. It has been about a year since this patch was originally posted and subsequently dropped on the basis of insufficient test data. Mark Bergsma has provided the following test results which seem to strongly support the need for larger hash table sizes: We do however run into the same problem with the default setting (212 = 4096 entries), as most of our LVS balancers handle around a million connections/SLAB entries at any point in time (around 100-150 kpps load). With only 4096 hash table entries this implies that each entry consists of a linked list of 256 connections *on average*. To provide some statistics, I did an oprofile run on an 2.6.31 kernel, with both the default 4096 table size, and the same kernel recompiled with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a quick test setup with a part of Wikimedia/Wikipedia's live traffic mirrored by the switch to the test host. With the default setting, at ~ 120 kpps packet load we saw a typical %si CPU usage of around 30-35%, and oprofile reported a hot spot in ip_vs_conn_in_get: samples % image name app name symbol name 1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 302577 7.4554 bnx2 bnx2 /bnx2 181984 4.4840 vmlinux vmlinux __ticket_spin_lock 128636 3.1695 vmlinux vmlinux ip_route_input 74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get 68482 1.6874 vmlinux vmlinux mwait_idle After loading the recompiled kernel with 218 entries, %si CPU usage dropped in half to around 12-18%, and oprofile looks much healthier, with only 7% spent in ip_vs_conn_in_get: samples % image name app name symbol name 265641 14.4616 bnx2 bnx2 /bnx2 143251 7.7986 vmlinux vmlinux __ticket_spin_lock 140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 94364 5.1372 vmlinux vmlinux mwait_idle 86267 4.6964 vmlinux vmlinux ip_route_input [ horms@verge.net.au: trivial up-port and minor style fixes ] Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Cc: Mark Bergsma <mark@wikimedia.org> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-04ipvs: Add boundary check on ioctl argumentsArjan van de Ven1-1/+13
The ipvs code has a nifty system for doing the size of ioctl command copies; it defines an array with values into which it indexes the cmd to find the right length. Unfortunately, the ipvs code forgot to check if the cmd was in the range that the array provides, allowing for an index outside of the array, which then gives a "garbage" result into the length, which then gets used for copying into a stack buffer. Fix this by adding sanity checks on these as well as the copy size. [ horms@verge.net.au: adjusted limit to IP_VS_SO_GET_MAX ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-04netfilter: xtables: obtain random bytes earlier, in checkentryJan Engelhardt2-23/+14
We can initialize the random hash bytes on checkentry. This is preferable since it is outside the hot path. Reference: http://bugzilla.netfilter.org/show_bug.cgi?id=621 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-04netfilter: xtables: do not grab random bytes at __initJan Engelhardt2-2/+11
"It is deliberately not done in the init function, since we might not have sufficient random while booting." Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-04netfilter: xt_recent: save 8 bytes per htableJan Engelhardt1-4/+4
Moving rnd_inited into the hole after the uint8 lets go of the uint32 rnd_inited was using, plus the padding that would follow the int group. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-22ipvs: ip_vs_wrr.c: use lib/gcd.cFlorian Fainelli2-15/+3
Remove the private version of the greatest common divider to use lib/gcd.c, the latter also implementing the a < b case. [akpm@linux-foundation.org: repair neighboring whitespace because the diff looked odd] Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Simon Horman <horms@verge.net.au> Cc: Julius Volz <juliusv@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2-0/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2 ixgbe: allow tx of pre-formatted vlan tagged packets ixgbe: Fix 82598 premature copper PHY link indicatation ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters bcm63xx_enet: fix compilation failure after get_stats_count removal packet: dont call sleeping functions while holding rcu_read_lock() tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. ipvs: zero usvc and udest netfilter: fix crashes in bridge netfilter caused by fragment jumps ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery sky2: leave PCI config space writeable sky2: print Optima chip name x25: Update maintainer. ipvs: fix synchronization on connection close netfilter: xtables: document minimal required version drivers/net/bonding/: : use pr_fmt can: CAN_MCP251X should depend on HAS_DMA drivers/net/usb: Correct code taking the size of a pointer drivers/net/cpmac.c: Correct code taking the size of a pointer drivers/net/sfc: Correct code taking the size of a pointer ...
2009-12-15tree-wide: convert open calls to remove spaces to skip_spaces() lib functionAndré Goddard Rosa1-2/+1
Makes use of skip_spaces() defined in lib/string.c for removing leading spaces from strings all over the tree. It decreases lib.a code size by 47 bytes and reuses the function tree-wide: text data bss dec hex filename 64688 584 592 65864 10148 (TOTALS-BEFORE) 64641 584 592 65817 10119 (TOTALS-AFTER) Also, while at it, if we see (*str && isspace(*str)), we can be sure to remove the first condition (*str) as the second one (isspace(*str)) also evaluates to 0 whenever *str == 0, making it redundant. In other words, "a char equals zero is never a space". Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below, and found occurrences of this pattern on 3 more files: drivers/leds/led-class.c drivers/leds/ledtrig-timer.c drivers/video/output.c @@ expression str; @@ ( // ignore skip_spaces cases while (*str && isspace(*str)) { \(str++;\|++str;\) } | - *str && isspace(*str) ) Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Julia Lawall <julia@diku.dk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Neil Brown <neilb@suse.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: David Howells <dhowells@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15ipvs: zero usvc and udestSimon Horman1-0/+4
Make sure that any otherwise uninitialised fields of usvc are zero. This has been obvserved to cause a problem whereby the port of fwmark services may end up as a non-zero value which causes scheduling of a destination server to fail for persisitent services. As observed by Deon van der Merwe <dvdm@truteq.co.za>. This fix suggested by Julian Anastasov <ja@ssi.bg>. For good measure also zero udest. Cc: Deon van der Merwe <dvdm@truteq.co.za> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-14ipvs: fix synchronization on connection closeXiaotian Feng1-0/+1
commit 9d3a0de makes slaves expire as they would do on the master with much shorter timeouts. But it introduces another problem: When we close a connection, on master server the connection became CLOSE_WAIT/TIME_WAIT, it was synced to slaves, but if master is finished within it's timeouts (CLOSE), it will not be synced to slaves. Then slaves will be kept on CLOSE_WAIT/TIME_WAIT until timeout reaches. Thus we should also sync with CLOSE. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds10-112/+99
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds14-71/+23
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
2009-12-03Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6David S. Miller6-64/+74
2009-12-01net: Simplify conntrack_proto_gre pernet operations.Eric W. Biederman1-14/+6
Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: Simplify conntrack_proto_dccp pernet operations.Eric W. Biederman1-23/+8
Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29net: Move && and || to end of previous lineJoe Perches2-7/+7
Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller3-15/+7
Conflicts: drivers/ieee802154/fakehard.c drivers/net/e1000e/ich8lan.c drivers/net/e1000e/phy.c drivers/net/netxen/netxen_nic_init.c drivers/net/wireless/ath/ath9k/main.c
2009-11-25net: use net_eq to compare netsOctavian Purdila1-1/+1
Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6David S. Miller1-1/+1
2009-11-23netfilter: xt_limit: fix invalid return code in limit_mt_check()Patrick McHardy1-1/+1
Commit acc738fe (netfilter: xtables: avoid pointer to self) introduced an invalid return value in limit_mt_check(). Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23netfilter: xtables: fix conntrack match v1 ipt-save outputFlorian Westphal1-44/+17
commit d6d3f08b0fd998b647a05540cedd11a067b72867 (netfilter: xtables: conntrack match revision 2) does break the v1 conntrack match iptables-save output in a subtle way. Problem is as follows: up = kmalloc(sizeof(*up), GFP_KERNEL); [..] /* * The strategy here is to minimize the overhead of v1 matching, * by prebuilding a v2 struct and putting the pointer into the * v1 dataspace. */ memcpy(up, info, offsetof(typeof(*info), state_mask)); [..] *(void **)info = up; As the v2 struct pointer is saved in the match data space, it clobbers the first structure member (->origsrc_addr). Because the _v1 match function grabs this pointer and does not actually look at the v1 origsrc, run time functionality does not break. But iptables -nvL (or iptables-save) cannot know that v1 origsrc_addr has been overloaded in this way: $ iptables -p tcp -A OUTPUT -m conntrack --ctorigsrc 10.0.0.1 -j ACCEPT $ iptables-save -A OUTPUT -p tcp -m conntrack --ctorigsrc 128.173.134.206 -j ACCEPT (128.173... is the address to the v2 match structure). To fix this, we take advantage of the fact that the v1 and v2 structures are identical with exception of the last two structure members (u8 in v1, u16 in v2). We extract them as early as possible and prevent the v2 matching function from looking at those two members directly. Previously reported by Michel Messerschmidt via Ben Hutchings, also see Debian Bug tracker #556587. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23netfilter: nf_ct_tcp: improve out-of-sync situation in TCP trackingPablo Neira Ayuso1-10/+41
Without this patch, if we receive a SYN packet from the client while the firewall is out-of-sync, we let it go through. Then, if we see the SYN/ACK reply coming from the server, we destroy the conntrack entry and drop the packet to trigger a new retransmission. Then, the retransmision from the client is used to start a new clean session. This patch improves the current handling. Basically, if we see an unexpected SYN packet, we annotate the TCP options. Then, if we see the reply SYN/ACK, this means that the firewall was indeed out-of-sync. Therefore, we set a clean new session from the existing entry based on the annotated values. This patch adds two new 8-bits fields that fit in a 16-bits gap of the ip_ct_tcp structure. This patch is particularly useful for conntrackd since the asynchronous nature of the state-synchronization allows to have backup nodes that are not perfect copies of the master. This helps to improve the recovery under some worst-case scenarios. I have tested this by creating lots of conntrack entries in wrong state: for ((i=1024;i<65535;i++)); do conntrack -I -p tcp -s 192.168.2.101 -d 192.168.2.2 --sport $i --dport 80 -t 800 --state ESTABLISHED -u ASSURED,SEEN_REPLY; done Then, I make some TCP connections: $ echo GET / | nc 192.168.2.2 80 The events show the result: [UPDATE] tcp 6 60 SYN_RECV src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 120 FIN_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 30 LAST_ACK src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 120 TIME_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] and tcpdump shows no retransmissions: 20:47:57.271951 IP 192.168.2.101.33221 > 192.168.2.2.www: S 435402517:435402517(0) win 5840 <mss 1460,sackOK,timestamp 4294961827 0,nop,wscale 6> 20:47:57.273538 IP 192.168.2.2.www > 192.168.2.101.33221: S 3509927945:3509927945(0) ack 435402518 win 5792 <mss 1460,sackOK,timestamp 235681024 4294961827,nop,wscale 4> 20:47:57.273608 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024> 20:47:57.273693 IP 192.168.2.101.33221 > 192.168.2.2.www: P 435402518:435402524(6) ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024> 20:47:57.275492 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402524 win 362 <nop,nop,timestamp 235681024 4294961827> 20:47:57.276492 IP 192.168.2.2.www > 192.168.2.101.33221: P 3509927946:3509928082(136) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827> 20:47:57.276515 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509928082 win 108 <nop,nop,timestamp 4294961828 235681025> 20:47:57.276521 IP 192.168.2.2.www > 192.168.2.101.33221: F 3509928082:3509928082(0) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827> 20:47:57.277369 IP 192.168.2.101.33221 > 192.168.2.2.www: F 435402524:435402524(0) ack 3509928083 win 108 <nop,nop,timestamp 4294961828 235681025> 20:47:57.279491 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402525 win 362 <nop,nop,timestamp 235681025 4294961828> I also added a rule to log invalid packets, with no occurrences :-) . Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-19netfilter: nf_log: fix sleeping function called from invalid context in seq_show()Patrick McHardy1-13/+5
[ 171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280 [ 171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep [ 171.925306] 2 locks held by grep/671: [ 171.925312] #0: (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c [ 171.925340] #1: (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44 [ 171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3 [ 171.925380] Call Trace: [ 171.925398] [<c105104e>] ? __debug_show_held_locks+0x1e/0x20 [ 171.925414] [<c10264ac>] __might_sleep+0xfb/0x102 [ 171.925430] [<c1461521>] mutex_lock_nested+0x1c/0x2ad [ 171.925444] [<c1391c9e>] seq_show+0x74/0x127 [ 171.925456] [<c10b8c5c>] seq_read+0x1b4/0x36c [ 171.925469] [<c10b8aa8>] ? seq_read+0x0/0x36c [ 171.925483] [<c10d5c8e>] proc_reg_read+0x60/0x74 [ 171.925496] [<c10d5c2e>] ? proc_reg_read+0x0/0x74 [ 171.925510] [<c10a4468>] vfs_read+0x87/0x110 [ 171.925523] [<c10a458a>] sys_read+0x3b/0x60 [ 171.925538] [<c1002a49>] syscall_call+0x7/0xb Fix it by replacing RCU with nf_log_mutex. Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-19netfilter: xt_osf: fix xt_osf_remove_callback() return valuePatrick McHardy1-1/+1
Return a negative error value. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-0/+1
Conflicts: drivers/net/sfc/sfe4001.c drivers/net/wireless/libertas/cmd.c drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/rtl8187se/Kconfig drivers/staging/rtl8192e/Kconfig
2009-11-18netns: net_identifiers should be read_mostlyEric Dumazet2-2/+2
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17Merge commit 'v2.6.32-rc7'Eric W. Biederman3-44/+38
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler gets a small bug fix and the sysctl tree where I am removing all sysctl strategy routines.
2009-11-13netfilter: nf_log: fix sleeping function called from invalid context in seq_show()Wu Fengguang1-13/+5
[ 171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280 [ 171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep [ 171.925306] 2 locks held by grep/671: [ 171.925312] #0: (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c [ 171.925340] #1: (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44 [ 171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3 [ 171.925380] Call Trace: [ 171.925398] [<c105104e>] ? __debug_show_held_locks+0x1e/0x20 [ 171.925414] [<c10264ac>] __might_sleep+0xfb/0x102 [ 171.925430] [<c1461521>] mutex_lock_nested+0x1c/0x2ad [ 171.925444] [<c1391c9e>] seq_show+0x74/0x127 [ 171.925456] [<c10b8c5c>] seq_read+0x1b4/0x36c [ 171.925469] [<c10b8aa8>] ? seq_read+0x0/0x36c [ 171.925483] [<c10d5c8e>] proc_reg_read+0x60/0x74 [ 171.925496] [<c10d5c2e>] ? proc_reg_read+0x0/0x74 [ 171.925510] [<c10a4468>] vfs_read+0x87/0x110 [ 171.925523] [<c10a458a>] sys_read+0x3b/0x60 [ 171.925538] [<c1002a49>] syscall_call+0x7/0xb Fix it by replacing RCU with nf_log_mutex. Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-13netfilter: xt_osf: fix xt_osf_remove_callback() return valueRoel Kluin1-1/+1
Return a negative error value. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-12sysctl net: Remove unused binary sysctl codeEric W. Biederman14-71/+23
Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds3-44/+38
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) net/fsl_pq_mdio: add module license GPL can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo() can: should not use __dev_get_by_index() without locks hisax: remove bad udelay call to fix build error on ARM ipip: Fix handling of DF packets when pmtudisc is OFF qlge: Set PCIe reset type for EEH to fundamental. qlge: Fix early exit from mbox cmd complete wait. ixgbe: fix traffic hangs on Tx with ioatdma loaded ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled ixgbe: Fix gso_max_size for 82599 when DCB is enabled macsonic: fix crash on PowerBook 520 NET: cassini, fix lock imbalance ems_usb: Fix byte order issues on big endian machines be2net: Bug fix to send config commands to hardware after netdev_register be2net: fix to set proper flow control on resume netfilter: xt_connlimit: fix regression caused by zero family value rt2x00: Don't queue ieee80211 work after USB removal Revert "ipw2200: fix oops on missing firmware" decnet: netdevice refcount leak netfilter: nf_nat: fix NAT issue in 2.6.30.4+ ...
2009-11-08Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-6/+4
Conflicts: drivers/net/can/usb/ems_usb.c
2009-11-06netfilter: xt_connlimit: fix regression caused by zero family valueJan Engelhardt1-6/+4
Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all instances of par->match->family were changed to par->family. References: http://bugzilla.netfilter.org/show_bug.cgi?id=610 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06netfilter: remove unneccessary checks from netlink notifiersPatrick McHardy2-4/+2
The NETLINK_URELEASE notifier is only invoked for bound sockets, so there is no need to check ->pid again. Signed-off-by: Patrick McHardy <kaber@trash.net>