aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/xt_CT.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-18netfilter: conntrack: remove nf_ct_l4proto_find_getFlorian Westphal1-1/+1
Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove l3->l4 mapping informationFlorian Westphal1-1/+1
l4 protocols are demuxed by l3num, l4num pair. However, almost all l4 trackers are l3 agnostic. Only exceptions are: - gre, icmp (ipv4 only) - icmpv6 (ipv6 only) This commit gets rid of the l3 mapping, l4 trackers can now be looked up by their IPPROTO_XXX value alone, which gets rid of the additional l3 indirection. For icmp, ipcmp6 and gre, add a check on state->pf and return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4, this seems more fitting than using the generic tracker. Additionally we can kill the 2nd l4proto definitions that were needed for v4/v6 split -- they are now the same so we can use single l4proto struct for each protocol, rather than two. The EXPORT_SYMBOLs can be removed as all these object files are part of nf_conntrack with no external references. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout objectPablo Neira Ayuso1-2/+2
The timeout policy is currently embedded into the nfnetlink_cttimeout object, move the policy into an independent object. This allows us to reuse part of the existing conntrack timeout extension from nf_tables without adding dependencies with the nfnetlink_cttimeout object layout. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: Remove useless param helper of nf_ct_helper_ext_addGao Feng1-1/+1
The param helper of nf_ct_helper_ext_add is useless now, then remove it now. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: xt_CT: Reject the non-null terminated string from user spaceGao Feng1-0/+10
The helper and timeout strings are from user-space, we need to make sure they are null terminated. If not, evil user could make kernel read the unexpected memory, even print it when fail to find by the following codes. pr_info_ratelimited("No such helper \"%s\"\n", helper_name); Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: xt_CT: use pr ratelimitingFlorian Westphal1-12/+13
checkpatch complains about line > 80 but this would require splitting "literal" over two lines which is worse. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24netfilter: conntrack: make protocol tracker pointers constFlorian Westphal1-1/+1
Doesn't change generated code, but will make it easier to eventually make the actual trackers themselvers const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: introduce nf_conntrack_helper_put helper functionLiping Zhang1-3/+3
And convert module_put invocation to nf_conntrack_helper_put, this is prepared for the followup patch, which will add a refcnt for cthelper, so we can reject the deleting request when cthelper is in use. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-2/+9
Pablo Neira Ayuso says: ==================== Netfilter/IPVS/OVS fixes for net The following patchset contains a rather large batch of Netfilter, IPVS and OVS fixes for your net tree. This includes fixes for ctnetlink, the userspace conntrack helper infrastructure, conntrack OVS support, ebtables DNAT target, several leaks in error path among other. More specifically, they are: 1) Fix reference count leak in the CT target error path, from Gao Feng. 2) Remove conntrack entry clashing with a matching expectation, patch from Jarno Rajahalme. 3) Fix bogus EEXIST when registering two different userspace helpers, from Liping Zhang. 4) Don't leak dummy elements in the new bitmap set type in nf_tables, from Liping Zhang. 5) Get rid of module autoload from conntrack update path in ctnetlink, we don't need autoload at this late stage and it is happening with rcu read lock held which is not good. From Liping Zhang. 6) Fix deadlock due to double-acquire of the expect_lock from conntrack update path, this fixes a bug that was introduced when the central spinlock got removed. Again from Liping Zhang. 7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock protection that was selected when the central spinlock was removed was not really protecting anything at all. 8) Protect sequence adjustment under ct->lock. 9) Missing socket match with IPv6, from Peter Tirsek. 10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from Linus Luessing. 11) Don't give up on evaluating the expression on new entries added via dynset expression in nf_tables, from Liping Zhang. 12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals with non-linear skbuffs. 13) Don't allow IPv6 service in IPVS if no IPv6 support is available, from Paolo Abeni. 14) Missing mutex release in error path of xt_find_table_lock(), from Dan Carpenter. 15) Update maintainers files, Netfilter section. Add Florian to the file, refer to nftables.org and change project status from Supported to Maintained. 16) Bail out on mismatching extensions in element updates in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24netfilter: xt_CT: fix refcnt leak on error pathGao Feng1-2/+9
There are two cases which causes refcnt leak. 1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should free the timeout refcnt. Now goto the err_put_timeout error handler instead of going ahead. 2. When the time policy is not found, we should call module_put. Otherwise, the related cthelper module cannot be removed anymore. It is easy to reproduce by typing the following command: # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: kill the fake untracked conntrack objectsFlorian Westphal1-8/+8
resurrect an old patch from Pablo Neira to remove the untracked objects. Currently, there are four possible states of an skb wrt. conntrack. 1. No conntrack attached, ct is NULL. 2. Normal (kmem cache allocated) ct attached. 3. a template (kmalloc'd), not in any hash tables at any point in time 4. the 'untracked' conntrack, a percpu nf_conn object, tagged via IPS_UNTRACKED_BIT in ct->status. Untracked is supposed to be identical to case 1. It exists only so users can check -m conntrack --ctstate UNTRACKED vs. -m conntrack --ctstate INVALID e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is supposed to be a no-op. Thus currently we need to check ct == NULL || nf_ct_is_untracked(ct) in a lot of places in order to avoid altering untracked objects. The other consequence of the percpu untracked object is that all -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op (inc/dec the untracked conntracks refcount). This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to make the distinction instead. The (few) places that care about packet invalid (ct is NULL) vs. packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED, but all other places can omit the nf_ct_is_untracked() check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: merge ctinfo into nfct pointer storage areaFlorian Westphal1-2/+2
After this change conntrack operations (lookup, creation, matching from ruleset) only access one instead of two sk_buff cache lines. This works for normal conntracks because those are allocated from a slab that guarantees hw cacheline or 8byte alignment (whatever is larger) so the 3 bits needed for ctinfo won't overlap with nf_conn addresses. Template allocation now does manual address alignment (see previous change) on arches that don't have sufficent kmalloc min alignment. Some spots intentionally use skb->_nfct instead of skb_nfct() helpers, this is to avoid undoing the skb_nfct() use when we remove untracked conntrack object in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: add and use nf_ct_set helperFlorian Westphal1-4/+2
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff. This avoids changing code in followup patch that merges skb->nfct and skb->nfctinfo into skb->_nfct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02skbuff: add and use skb_nfct helperFlorian Westphal1-1/+1
Followup patch renames skb->nfct and changes its type so add a helper to avoid intrusive rename change later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09xtables: extend matches and targets with .usersizeWillem de Bruijn1-0/+3
In matches and targets that define a kernel-only tail to their xt_match and xt_target data structs, add a field .usersize that specifies up to where data is to be shared with userspace. Performed a search for comment "Used internally by the kernel" to find relevant matches and targets. Manually inspected the structs to derive a valid offsetof. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: add and use nf_ct_netns_get/putFlorian Westphal1-3/+3
currently aliased to try_module_get/_put. Will be changed in next patch when we add functions to make use of ->net argument to store usercount per l3proto tracker. This is needed to avoid registering the conntrack hooks in all netns and later only enable connection tracking in those that need conntrack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-14netfilter: cttimeout: add netns supportPablo Neira1-1/+1
Add a per-netns list of timeout objects and adjust code to use it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-12netfilter: conntrack: fix crash on timeout object removalPablo Neira Ayuso1-1/+3
The object and module refcounts are updated for each conntrack template, however, if we delete the iptables rules and we flush the timeout database, we may end up with invalid references to timeout object that are just gone. Resolve this problem by setting the timeout reference to NULL when the custom timeout entry is removed from our base. This patch requires some RCU trickery to ensure safe pointer handling. This handling is similar to what we already do with conntrack helpers, the idea is to avoid bumping the timeout object reference counter from the packet path to avoid the cost of atomic ops. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-12netfilter: xt_CT: don't put back reference to timeout policy objectPablo Neira Ayuso1-0/+3
On success, this shouldn't put back the timeout policy object, otherwise we may have module refcount overflow and we allow deletion of timeout that are still in use. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+1
Conflicts: include/net/netfilter/nf_conntrack.h The conflict was an overlap between changing the type of the zone argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free. Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Oneliner to restore maps in nf_tables since we support addressing registers at 32 bits level. 2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n, oneliner from Bernhard Thaler. 3) Out of bound access in ipset hash:net* set types, reported by Dave Jones' KASan utility, patch from Jozsef Kadlecsik. 4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of unnamed unions, patch from Elad Raz. 5) Add a workaround to address inconsistent endianess in the res_id field of nfnetlink batch messages, reported by Florian Westphal. 6) Fix error paths of CT/synproxy since the conntrack template was moved to use kmalloc, patch from Daniel Borkmann. All of them look good to me to reach 4.2, I can route this to -stable myself too, just let me know what you prefer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error pathsDaniel Borkmann1-1/+1
Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-21Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso1-2/+3
Resolve conflicts with conntrack template fixes. Conflicts: net/netfilter/nf_conntrack_core.c net/netfilter/nf_synproxy_core.c net/netfilter/xt_CT.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18netfilter: nf_conntrack: add efficient mark to zone mappingDaniel Borkmann1-1/+4
This work adds the possibility of deriving the zone id from the skb->mark field in a scalable manner. This allows for having only a single template serving hundreds/thousands of different zones, for example, instead of the need to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. In order to not add/expose an extra ct->status bit, the zone structure has been extended to carry a flag for deriving the mark. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18netfilter: nf_conntrack: add direction support for zonesDaniel Borkmann1-1/+16
This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct->mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a "stable" value that would be equal for both directions at all times, f.e. if only zone->id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 | netperf |--+ +----------------+ | CT zone := mark [ORIGINAL] [ip,sport] := X +--------------+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--------------+ +---------------+ | +--- tenant-2 ---+ | ~~~|~~~ | netperf |--+ +-----------+ | +----------------+ mark := 2 | netserver |------ ... + [ip,sport] := X +-----------+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-11netfilter: nf_conntrack: push zone object into functionsDaniel Borkmann1-1/+5
This patch replaces the zone id which is pushed down into functions with the actual zone object. It's a bigger one-time change, but needed for later on extending zones with a direction parameter, and thus decoupling this additional information from all call-sites. No functional changes in this patch. The default zone becomes a global const object, namely nf_ct_zone_dflt and will be returned directly in various cases, one being, when there's f.e. no zoning support. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-30netfilter: nf_conntrack: checking for IS_ERR() instead of NULLDan Carpenter1-2/+3
We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-20netfilter: fix netns dependencies with conntrack templatesPablo Neira Ayuso1-5/+3
Quoting Daniel Borkmann: "When adding connection tracking template rules to a netns, f.e. to configure netfilter zones, the kernel will endlessly busy-loop as soon as we try to delete the given netns in case there's at least one template present, which is problematic i.e. if there is such bravery that the priviledged user inside the netns is assumed untrusted. Minimal example: ip netns add foo ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 ip netns del foo What happens is that when nf_ct_iterate_cleanup() is being called from nf_conntrack_cleanup_net_list() for a provided netns, we always end up with a net->ct.count > 0 and thus jump back to i_see_dead_people. We don't get a soft-lockup as we still have a schedule() point, but the serving CPU spins on 100% from that point onwards. Since templates are normally allocated with nf_conntrack_alloc(), we also bump net->ct.count. The issue why they are not yet nf_ct_put() is because the per netns .exit() handler from x_tables (which would eventually invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is called in the dependency chain at a *later* point in time than the per netns .exit() handler for the connection tracker. This is clearly a chicken'n'egg problem: after the connection tracker .exit() handler, we've teared down all the connection tracking infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be invoked at a later point in time during the netns cleanup, as that would lead to a use-after-free. At the same time, we cannot make x_tables depend on the connection tracker module, so that the xt_ct_tg_destroy() would be invoked earlier in the cleanup chain." Daniel confirms this has to do with the order in which modules are loaded or having compiled nf_conntrack as modules while x_tables built-in. So we have no guarantees regarding the order in which netns callbacks are executed. Fix this by allocating the templates through kmalloc() from the respective SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache. Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch is marked as unlikely since conntrack templates are rarely allocated and only from the configuration plane path. Note that templates are not kept in any list to avoid further dependencies with nf_conntrack anymore, thus, the tmpl larval list is removed. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Daniel Borkmann <daniel@iogearbox.net>
2014-02-05netfilter: nf_conntrack: don't release a conntrack with non-zero refcntPablo Neira Ayuso1-6/+1
With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com>
2014-01-03netfilter: xt_CT: fix error value in xt_ct_tg_check()Eric Leblond1-1/+3
If setting event mask fails then we were returning 0 for success. This patch updates return code to -EINVAL in case of problem. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: xt_CT: optimize XT_CT_NOTRACKEric Dumazet1-4/+6
The percpu untracked ct are not currently used for XT_CT_NOTRACK. xt_ct_tg_check()/xt_ct_target() provides a single ct. Thats not optimal as the ct->ct_general.use cache line will bounce among cpus. Use the intended [1] thing : xt_ct_target() should select the percpu object. [1] Refs : commit 5bfddbd46a95c97 ("netfilter: nf_conntrack: IPS_UNTRACKED bit") commit b3c5163fe0193a7 ("netfilter: nf_conntrack: per_cpu untracking") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-05netfilter: xt_CT: add alias flagPablo Neira Ayuso1-3/+29
This patch adds the alias flag to support full NOTRACK target aliasing. Based on initial patch from Jozsef Kadlecsik. Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hi> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-05netfilter: xt_CT: merge common code of revision 0 and 1Pablo Neira Ayuso1-89/+56
This patch merges the common code for revision 0 and 1. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-10netfilter: xt_CT: fix unset return value if conntrack zone are disabledPablo Neira Ayuso1-2/+2
net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’: net/netfilter/xt_CT.c:250:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v0’: net/netfilter/xt_CT.c:112:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] Reported-by: Borislav Petkov <bp@alien8.de> Acked-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-24netfilter: xt_CT: recover NOTRACK target supportPablo Neira Ayuso1-1/+49
Florian Westphal reported that the removal of the NOTRACK target (9655050 netfilter: remove xt_NOTRACK) is breaking some existing setups. That removal was scheduled for removal since long time ago as described in Documentation/feature-removal-schedule.txt What: xt_NOTRACK Files: net/netfilter/xt_NOTRACK.c When: April 2011 Why: Superseded by xt_CT Still, people may have not notice / may have decided to stick to an old iptables version. I agree with him in that some more conservative approach by spotting some printk to warn users for some time is less agressive. Current iptables 1.4.16.3 already contains the aliasing support that makes it point to the CT target, so upgrading would fix it. Still, the policy so far has been to avoid pushing our users to upgrade. As a solution, this patch recovers the NOTRACK target inside the CT target and it now spots a warning. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16netfilter: xt_CT: fix crash while destroy ct templatesPablo Neira Ayuso1-0/+8
In (d871bef netfilter: ctnetlink: dump entries from the dying and unconfirmed lists), we assume that all conntrack objects are inserted in any of the existing lists. However, template conntrack objects were not. This results in hitting BUG_ON in the destroy_conntrack path while removing a rule that uses the CT target. This patch fixes the situation by adding the template lists, which is where template conntrack objects reside now. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-15netfilter: xt_CT: fix timeout setting with IPv6Pablo Neira Ayuso1-4/+6
This patch fixes ip6tables and the CT target if it is used to set some custom conntrack timeout policy for IPv6. Use xt_ct_find_proto which already handles the ip6tables case for us. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03netfilter: xt_CT: refactorize xt_ct_tg_checkPablo Neira Ayuso1-136/+126
This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce the size of xt_ct_tg_check. This aims to improve code mantainability by splitting xt_ct_tg_check in smaller chunks. Suggested by Eric Dumazet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16netfilter: nf_ct_helper: implement variable length helper private dataPablo Neira Ayuso1-18/+26
This patch uses the new variable length conntrack extensions. Instead of using union nf_conntrack_help that contain all the helper private data information, we allocate variable length area to store the private helper data. This patch includes the modification of all existing helpers. It also includes a couple of include header to avoid compilation warnings. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17netfilter: xt_CT: remove redundant header includeEldad Zack1-1/+0
nf_conntrack_l4proto.h is included twice. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-30netfilter: xt_CT: fix wrong checking in the timeout assignment pathPablo Neira Ayuso1-1/+1
The current checking always succeeded. We have to check the first character of the string to check that it's empty, thus, skipping the timeout path. This fixes the use of the CT target without the timeout option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-03netfilter: xt_CT: fix missing put timeout object in error pathPablo Neira Ayuso1-5/+19
The error path misses putting the timeout object. This patch adds new function xt_ct_tg_timeout_put() to put the timeout object. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03netfilter: xt_CT: allocation has to be GFP_ATOMIC under rcu_read_lock sectionPablo Neira Ayuso1-1/+1
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03netfilter: xt_CT: remove a compile warningPablo Neira Ayuso1-0/+2
If CONFIG_NF_CONNTRACK_TIMEOUT=n we have following warning : CC [M] net/netfilter/xt_CT.o net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’: net/netfilter/xt_CT.c:284: warning: label ‘err4’ defined but not used Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: xt_CT: fix assignation of the generic protocol trackerPablo Neira Ayuso1-1/+8
`iptables -p all' uses 0 to match all protocols, while the conntrack subsystem uses 255. We still need `-p all' to attach the custom timeout policies for the generic protocol tracker. Moreover, we may use `iptables -p sctp' while the SCTP tracker is not loaded. In that case, we have to default on the generic protocol tracker. Another possibility is `iptables -p ip' that should be supported as well. This patch makes sure we validate all possible scenarios. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: xt_CT: missing rcu_read_lock section in timeout assignmentPablo Neira Ayuso1-6/+12
Fix a dereference to pointer without rcu_read_lock held. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: cttimeout: fix dependency with l4protocol conntrack modulePablo Neira Ayuso1-2/+4
This patch introduces nf_conntrack_l4proto_find_get() and nf_conntrack_l4proto_put() to fix module dependencies between timeout objects and l4-protocol conntrack modules. Thus, we make sure that the module cannot be removed if it is used by any of the cttimeout objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07netfilter: xt_CT: allow to attach timeout policy + glue codePablo Neira Ayuso1-15/+205
This patch allows you to attach the timeout policy via the CT target, it adds a new revision of the target to ensure backward compatibility. Moreover, it also contains the glue code to stick the timeout object defined via nfnetlink_cttimeout to the given flow. Example usage (it requires installing the nfct tool and libnetfilter_cttimeout): 1) create the timeout policy: nfct timeout add tcp-policy0 inet tcp \ established 1000 close 10 time_wait 10 last_ack 10 2) attach the timeout policy to the packet: iptables -I PREROUTING -t raw -p tcp -j CT --timeout tcp-policy0 You have to install the following user-space software: a) libnetfilter_cttimeout: git://git.netfilter.org/libnetfilter_cttimeout b) nfct: git://git.netfilter.org/nfct You also have to get iptables with -j CT --timeout support. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-16netfilter: revert user-space expectation helper supportPablo Neira Ayuso1-5/+3
This patch partially reverts: 3d058d7 netfilter: rework user-space expectation helper support that was applied during the 3.2 development cycle. After this patch, the tree remains just like before patch bc01bef, that initially added the preliminary infrastructure. I decided to partially revert this patch because the approach that I proposed to resolve this problem is broken in NAT setups. Moreover, a new infrastructure will be submitted for the 3.3.x development cycle that resolve the existing issues while providing a neat solution. Since nobody has been seriously using this infrastructure in user-space, the removal of this feature should affect any know FOSS project (to my knowledge). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23netfilter: rework user-space expectation helper supportPablo Neira Ayuso1-3/+5
This partially reworks bc01befdcf3e40979eb518085a075cbf0aacede0 which added userspace expectation support. This patch removes the nf_ct_userspace_expect_list since now we force to use the new iptables CT target feature to add the helper extension for conntracks that have attached expectations from userspace. A new version of the proof-of-concept code to implement userspace helpers from userspace is available at: http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2 This patch also modifies the CT target to allow to set the conntrack's userspace helper status flags. This flag is used to tell the conntrack system to explicitly allocate the helper extension. This helper extension is useful to link the userspace expectations with the master conntrack that is being tracked from one userspace helper. This feature fixes a problem in the current approach of the userspace helper support. Basically, if the master conntrack that has got a userspace expectation vanishes, the expectations point to one invalid memory address. Thus, triggering an oops in the expectation deletion event path. I decided not to add a new revision of the CT target because I only needed to add a new flag for it. I'll document in this issue in the iptables manpage. I have also changed the return value from EINVAL to EOPNOTSUPP if one flag not supported is specified. Thus, in the future adding new features that only require a new flag can be added without a new revision. There is no official code using this in userspace (apart from the proof-of-concept) that uses this infrastructure but there will be some by beginning 2012. Reported-by: Sam Roberts <vieuxtech@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-04-21netfilter: xt_CT: provide info on why a rule was rejectedJan Engelhardt1-3/+8
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>