aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-94/+172
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22netfilter: fix nf_l4proto_log_invalid to log invalid packetsAndrei Vagin1-1/+1
It doesn't log a packet if sysctl_log_invalid isn't equal to protonum OR sysctl_log_invalid isn't equal to IPPROTO_RAW. This sentence is always true. I believe we need to replace OR to AND. Cc: Florian Westphal <fw@strlen.de> Fixes: c4f3db1595827 ("netfilter: conntrack: add and use nf_l4proto_log_invalid") Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-22netfilter: never get/set skb->tstampFlorian Westphal4-16/+18
setting net.netfilter.nf_conntrack_timestamp=1 breaks xmit with fq scheduler. skb->tstamp might be "refreshed" using ktime_get_real(), but fq expects CLOCK_MONOTONIC. This patch removes all places in netfilter that check/set skb->tstamp: 1. To fix the bogus "start" time seen with conntrack timestamping for outgoing packets, never use skb->tstamp and always use current time. 2. In nfqueue and nflog, only use skb->tstamp for incoming packets, as determined by current hook (prerouting, input, forward). 3. xt_time has to use system clock as well rather than skb->tstamp. We could still use skb->tstamp for prerouting/input/foward, but I see no advantage to make this conditional. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Cc: Eric Dumazet <edumazet@google.com> Reported-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15netfilter: nat: fix icmp id randomizationFlorian Westphal1-3/+8
Sven Auhagen reported that a 2nd ping request will fail if 'fully-random' mode is used. Reason is that if no proto information is given, min/max are both 0, so we set the icmp id to 0 instead of chosing a random value between 0 and 65535. Update test case as well to catch this, without fix this yields: [..] ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2) ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2) ... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets dropped. Fixes: 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple") Reported-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15netfilter: nf_tables: prevent shift wrap in nft_chain_parse_hook()Dan Carpenter1-1/+1
I believe that "hook->num" can be up to UINT_MAX. Shifting more than 31 bits would is undefined in C but in practice it would lead to shift wrapping. That would lead to an array overflow in nf_tables_addchain(): ops->hook = hook.type->hooks[ops->hooknum]; Fixes: fe19c04ca137 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15netfilter: ctnetlink: don't use conntrack/expect object addresses as idFlorian Westphal2-5/+64
else, we leak the addresses to userspace via ctnetlink events and dumps. Compute an ID on demand based on the immutable parts of nf_conn struct. Another advantage compared to using an address is that there is no immediate re-use of the same ID in case the conntrack entry is freed and reallocated again immediately. Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13ipvs: do not schedule icmp errors from tunnelsJulian Anastasov1-1/+1
We can receive ICMP errors from client or from tunneling real server. While the former can be scheduled to real server, the latter should not be scheduled, they are decapsulated only when existing connection is found. Fixes: 6044eeffafbe ("ipvs: attempt to schedule icmp packets") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13netfilter: conntrack: initialize ct->timeoutAlexander Potapenko1-0/+1
KMSAN started reporting an error when accessing ct->timeout for the first time without initialization: BUG: KMSAN: uninit-value in __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765 ... dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:624 __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310 __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765 nf_ct_refresh_acct ./include/net/netfilter/nf_conntrack.h:201 nf_conntrack_udp_packet+0xb44/0x1040 net/netfilter/nf_conntrack_proto_udp.c:122 nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1605 nf_conntrack_in+0x1250/0x26c9 net/netfilter/nf_conntrack_core.c:1696 ... Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159 kmsan_kmalloc+0xa9/0x130 mm/kmsan/kmsan_hooks.c:173 kmem_cache_alloc+0x554/0xb10 mm/slub.c:2789 __nf_conntrack_alloc+0x16f/0x690 net/netfilter/nf_conntrack_core.c:1342 init_conntrack+0x6cb/0x2490 net/netfilter/nf_conntrack_core.c:1421 Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: cc16921351d8ba1 ("netfilter: conntrack: avoid same-timeout update") Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13netfilter: conntrack: don't set related state for different outer addressFlorian Westphal2-67/+78
Luca Moro says: ------ The issue lies in the filtering of ICMP and ICMPv6 errors that include an inner IP datagram. For these packets, icmp_error_message() extract the ICMP error and inner layer to search of a known state. If a state is found the packet is tagged as related (IP_CT_RELATED). The problem is that there is no correlation check between the inner and outer layer of the packet. So one can encapsulate an error with an inner layer matching a known state, while its outer layer is directed to a filtered host. In this case the whole packet will be tagged as related. This has various implications from a rule bypass (if a rule to related trafic is allow), to a known state oracle. Unfortunately, we could not find a real statement in a RFC on how this case should be filtered. The closest we found is RFC5927 (Section 4.3) but it is not very clear. A possible fix would be to check that the inner IP source is the same than the outer destination. We believed this kind of attack was not documented yet, so we started to write a blog post about it. You can find it attached to this mail (sorry for the extract quality). It contains more technical details, PoC and discussion about the identified behavior. We discovered later that https://www.gont.com.ar/papers/filtering-of-icmp-error-messages.pdf described a similar attack concept in 2004 but without the stateful filtering in mind. ----- This implements above suggested fix: In icmp(v6) error handler, take outer destination address, then pass that into the common function that does the "related" association. After obtaining the nf_conn of the matching inner-headers connection, check that the destination address of the opposite direction tuple is the same as the outer address and only set RELATED if thats the case. Reported-by: Luca Moro <luca.moro@synacktiv.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12bridge: netfilter: unroll NF_HOOK helper in bridge input pathFlorian Westphal3-3/+2
Replace NF_HOOK() based invocation of the netfilter hooks with a private copy of nf_hook_slow(). This copy has one difference: it can return the rx handler value expected by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS. This is needed by the next patch to invoke the ebtables "broute" table via the standard netfilter hooks rather than the custom "br_should_route_hook" indirection that is used now. When the skb is to be "brouted", we must return RX_HANDLER_PASS from the bridge rx input handler, but there is no way to indicate this via NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the netfilter core or a percpu flag. text data bss dec filename 3369 56 0 3425 net/bridge/br_input.o.before 3458 40 0 3498 net/bridge/br_input.o.after This allows removal of the "br_should_route_hook" in the next patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11netfilter: nf_nat_masquerade: unify ipv4/6 notifier registrationFlorian Westphal3-91/+42
Only reason for having two different register functions was because of ipt_MASQUERADE and ip6t_MASQUERADE being two different modules. Previous patch merged those into xt_MASQUERADE, so we can merge this too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11netfilter: x_tables: merge ip and ipv6 masquerade modulesFlorian Westphal3-0/+158
No need to have separate modules for this. before: text data bss dec filename 2038 1168 0 3206 net/ipv4/netfilter/ipt_MASQUERADE.ko 1526 1024 0 2550 net/ipv6/netfilter/ip6t_MASQUERADE.ko after: text data bss dec filename 2521 1296 0 3817 net/netfilter/xt_MASQUERADE.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11netfilter: nf_nat: merge ip/ip6 masquerade headersFlorian Westphal2-4/+2
Both are now implemented by nf_nat_masquerade.c, so no need to keep different headers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: make two functions staticFlorian Westphal2-5/+3
They have no external callers anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nft_osf: Add version option supportFernando Fernandez Mancera2-12/+32
Add version option support to the nftables "osf" expression. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nft_redir: add inet supportFlorian Westphal1-0/+61
allows to redirect both ipv4 and ipv6 with a single rule in an inet nat table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nft_masq: add inet supportFlorian Westphal1-0/+64
This allows use of a single masquerade rule in nat inet family to handle both ipv4 and ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)Florian Westphal4-15/+10
NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is enabled. Now that the af-specific nat configuration switches have been removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nf_tables: merge route type into coreFlorian Westphal4-16/+174
very little code, so it really doesn't make sense to have extra modules or even a kconfig knob for this. Merge them and make functionality available unconditionally. The merge makes inet family route support trivial, so add it as well here. Before: text data bss dec hex filename 835 832 0 1667 683 nft_chain_route_ipv4.ko 870 832 0 1702 6a6 nft_chain_route_ipv6.ko 111568 2556 529 114653 1bfdd nf_tables.ko After: text data bss dec hex filename 113133 2556 529 116218 1c5fa nf_tables.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nat: add inet family nat supportFlorian Westphal4-17/+136
We need minimal support from the nat core for this, as we do not want to register additional base hooks. When an inet hook is registered, interally register ipv4 and ipv6 hooks for them and unregister those when inet hooks are removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08ipvs: allow tunneling with gue encapsulationJacky Hu2-5/+114
ipip packets are blocked in some public cloud environments, this patch allows gue encapsulation with the tunneling method, which would make tunneling working in those environments. Signed-off-by: Jacky Hu <hengqing.hu@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nf_flowtable: skip device lookup from interface indexPablo Neira Ayuso1-8/+2
Use the output device from the route that we cache in the flowtable entry. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nft_redir: Make nft_redir_dump staticYueHaibing1-1/+1
Fix sparse warning: net/netfilter/nft_redir.c:85:5: warning: symbol 'nft_redir_dump' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nf_tables: remove unused parameter ctxColin Ian King1-3/+2
Function nf_tables_set_desc_parse parameter ctx is not being used so remove it as it is redundant. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08time: Introduce jiffies64_to_msecs()Li RongQing1-3/+1
there is a similar helper in net/netfilter/nf_tables_api.c, this maybe become a common request someday, so move it to time.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-07rhashtable: use bit_spin_locks to protect hash bucket.NeilBrown1-1/+0
This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the bucket pointer to lock the hash chain for that bucket. The benefits of a bit spin_lock are: - no need to allocate a separate array of locks. - no need to have a configuration option to guide the choice of the size of this array - locking cost is often a single test-and-set in a cache line that will have to be loaded anyway. When inserting at, or removing from, the head of the chain, the unlock is free - writing the new address in the bucket head implicitly clears the lock bit. For __rhashtable_insert_fast() we ensure this always happens when adding a new key. - even when lockings costs 2 updates (lock and unlock), they are in a cacheline that needs to be read anyway. The cost of using a bit spin_lock is a little bit of code complexity, which I think is quite manageable. Bit spin_locks are sometimes inappropriate because they are not fair - if multiple CPUs repeatedly contend of the same lock, one CPU can easily be starved. This is not a credible situation with rhashtable. Multiple CPUs may want to repeatedly add or remove objects, but they will typically do so at different buckets, so they will attempt to acquire different locks. As we have more bit-locks than we previously had spinlocks (by at least a factor of two) we can expect slightly less contention to go with the slightly better cache behavior and reduced memory consumption. To enhance type checking, a new struct is introduced to represent the pointer plus lock-bit that is stored in the bucket-table. This is "struct rhash_lock_head" and is empty. A pointer to this needs to be cast to either an unsigned lock, or a "struct rhash_head *" to be useful. Variables of this type are most often called "bkt". Previously "pprev" would sometimes point to a bucket, and sometimes a ->next pointer in an rhash_head. As these are now different types, pprev is NULL when it would have pointed to the bucket. In that case, 'blk' is used, together with correct locking protocol. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28netfilter: Export nf_ct_{set,destroy}_timeout()Yi-Hung Wei2-87/+95
This patch exports nf_ct_set_timeout() and nf_ct_destroy_timeout(). The two functions are derived from xt_ct_destroy_timeout() and xt_ct_set_timeout() in xt_CT.c, and moved to nf_conntrack_timeout.c without any functional change. It would be useful for other users (i.e. OVS) that utilizes the finer-grain conntrack timeout feature. CC: Pablo Neira Ayuso <pablo@netfilter.org> CC: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-30/+41
2019-03-22genetlink: make policy common to familyJohannes Berg1-12/+1
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20netfilter: nf_tables: add missing ->release_ops() in error path of newrule()Taehee Yoo1-1/+4
->release_ops() callback releases resources and this is used in error path. If nf_tables_newrule() fails after ->select_ops(), it should release resources. but it can not call ->destroy() because that should be called after ->init(). At this point, ->release_ops() should be used for releasing resources. Test commands: modprobe -rv xt_tcpudp iptables-nft -I INPUT -m tcp <-- error command lsmod Result: Module Size Used by xt_tcpudp 20480 2 <-- it should be 0 Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18netfilter: nft_redir: fix module autoload with ip4Pablo Neira Ayuso1-1/+1
AF_INET4 does not exist. Fixes: c78efc99c750 ("netfilter: nf_tables: nat: merge nft_redir protocol specific modules)" Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18netfilter: nf_tables: bogus EBUSY in helper removal from transactionPablo Neira Ayuso1-3/+16
Proper use counter updates when activating and deactivating the object, otherwise, this hits bogus EBUSY error. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Laura Garcia <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18netfilter: fix NETFILTER_XT_TARGET_TEE dependenciesArnd Bergmann1-0/+1
With NETFILTER_XT_TARGET_TEE=y and IP6_NF_IPTABLES=m, we get a link error when referencing the NF_DUP_IPV6 module: net/netfilter/xt_TEE.o: In function `tee_tg6': xt_TEE.c:(.text+0x14): undefined reference to `nf_dup_ipv6' The problem here is the 'select NF_DUP_IPV6 if IP6_NF_IPTABLES' that forces NF_DUP_IPV6 to be =m as well rather than setting it to =y as was intended here. Adding a soft dependency on IP6_NF_IPTABLES avoids that broken configuration. Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 dependencies") Cc: Máté Eckl <ecklm94@gmail.com> Cc: Taehee Yoo <ap420073@gmail.com> Link: https://patchwork.ozlabs.org/patch/999498/ Link: https://lore.kernel.org/patchwork/patch/960062/ Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18netfilter: nft_set_rbtree: check for inactive element after flag mismatchPablo Neira Ayuso1-4/+3
Otherwise, we hit bogus ENOENT when removing elements. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18netfilter: nf_conntrack_sip: remove direct dependency on IPv6Alin Nastac1-21/+16
Previous implementation was not usable with CONFIG_IPV6=m. Fixes: a3419ce3356c ("netfilter: nf_conntrack_sip: add sip_external_media logic") Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: return immediately on empty commitFlorian Westphal1-0/+5
When running 'nft flush ruleset' while no rules exist, we will increment the generation counter and announce a new genid to userspace, yet nothing had changed in the first place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: use-after-free in dynamic operationsPablo Neira Ayuso1-1/+3
Smatch reports: net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy() error: dereferencing freed memory 'expr->ops' net/netfilter/nf_tables_api.c 2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx, 2163 struct nft_expr *expr) 2164 { 2165 if (expr->ops->destroy) 2166 expr->ops->destroy(ctx, expr); ^^^^ --> 2167 module_put(expr->ops->type->owner); ^^^^^^^^^ 2168 } Smatch says there are three functions which free expr->ops. Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: bogus EBUSY when deleting set after flushPablo Neira Ayuso4-13/+54
Set deletion after flush coming in the same batch results in EBUSY. Add set use counter to track the number of references to this set from rules. We cannot rely on the list of bindings for this since such list is still populated from the preparation phase. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-08netfilter: nf_tables: fix set double-free in abort pathPablo Neira Ayuso1-6/+11
The abort path can cause a double-free of an anonymous set. Added-and-to-be-aborted rule looks like this: udp dport { 137, 138 } drop The to-be-aborted transaction list looks like this: newset newsetelem newsetelem rule This gets walked in reverse order, so first pass disables the rule, the set elements, then the set. After synchronize_rcu(), we then destroy those in same order: rule, set element, set element, newset. Problem is that the anonymous set has already been bound to the rule, so the rule (lookup expression destructor) already frees the set, when then cause use-after-free when trying to delete the elements from this set, then try to free the set again when handling the newset expression. Rule releases the bound set in first place from the abort path, this causes the use-after-free on set element removal when undoing the new element transactions. To handle this, skip new element transaction if set is bound from the abort path. This is still causes the use-after-free on set element removal. To handle this, remove transaction from the list when the set is already bound. Joint work with Florian Westphal. Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-08netfilter: nat: don't register device notifier twiceFlorian Westphal1-16/+19
Otherwise, we get notifier list corruption. This is the most simple fix: remove the device notifier call chain from the ipv6 masquerade register function and handle it only in the ipv4 version. The better fix is merge nf_nat_masquerade_ipv4/6_(un)register_notifier into a single nf_nat_masquerade_(un)register_notifiers but to do this its needed to first merge the two masquerade modules into a single xt_MASQUERADE. Furthermore, we need to use different refcounts for ipv4/ipv6 until we can merge MASQUERADE. Fixes: d1aca8ab3104a ("netfilter: nat: merge ipv4 and ipv6 masquerade functionality") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: merge ipv4 and ipv6 nat chain typesFlorian Westphal3-0/+111
Merge the ipv4 and ipv6 nat chain type. This is the last missing piece which allows to provide inet family support for nat in a follow patch. The kconfig knobs for ipv4/ipv6 nat chain are removed, the nat chain type will be built unconditionally if NFT_NAT expression is enabled. Before: text data bss dec hex filename 1576 896 0 2472 9a8 nft_chain_nat_ipv4.ko 1697 896 0 2593 a21 nft_chain_nat_ipv6.ko After: text data bss dec hex filename 1832 896 0 2728 aa8 nft_chain_nat.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_masq protocol specific modulesFlorian Westphal2-13/+168
The family specific masq modules are way too small to warrant an extra module, just place all of them in nft_masq. before: text data bss dec hex filename 1001 832 0 1833 729 nft_masq.ko 766 896 0 1662 67e nft_masq_ipv4.ko 764 896 0 1660 67c nft_masq_ipv6.ko after: 2010 960 0 2970 b9a nft_masq.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_redir protocol specific modulesFlorian Westphal2-12/+143
before: text data bss dec hex filename 990 832 0 1822 71e nft_redir.ko 697 896 0 1593 639 nft_redir_ipv4.ko 713 896 0 1609 649 nft_redir_ipv6.ko after: text data bss dec hex filename 1910 960 0 2870 b36 nft_redir.ko size is reduced, all helpers from nft_redir.ko can be made static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: xt_IDLETIMER: fix sysfs callback function typeSami Tolvanen1-10/+4
Use struct device_attribute instead of struct idletimer_tg_attr, and the correct callback function type to avoid indirect call mismatches with Control Flow Integrity checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2Li RongQing1-0/+1
CONNTRACK_LOCKS is divisor when computer array index, if it is power of 2, compiler will optimize modulo operation as bitwise AND, or else modulo will lower performance. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: check the result of dereferencing base_chain->statsLi RongQing1-6/+8
Check the result of dereferencing base_chain->stats, instead of result of this_cpu_ptr with NULL. base_chain->stats maybe be changed to NULL when a chain is updated and a new NULL counter can be attached. And we do not need to check returning of this_cpu_ptr since base_chain->stats is from percpu allocator if it is non-NULL, this_cpu_ptr returns a valid value. And fix two sparse error by replacing rcu_access_pointer and rcu_dereference with READ_ONCE under rcu_read_lock. Thanks for Eric's help to finish this patch. Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set") Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01ipvs: get sctphdr by sctphoff in sctp_csum_checkXin Long1-5/+2
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls skb_make_writable() to ensure sctphdr to be linearized. So there's no need to get sctphdr by calling skb_header_pointer() in sctp_csum_check(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing1-2/+2
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nft_tunnel: Add dst_cache supportwenxu1-0/+7
The metadata_dst does not initialize the dst_cache field, this causes problems to ip_md_tunnel_xmit() since it cannot use this cache, hence, Triggering a route lookup for every packet. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: conntrack: tcp: only close if RST matches exact sequenceFlorian Westphal1-10/+40
TCP resets cause instant transition from established to closed state provided the reset is in-window. Endpoints that implement RFC 5961 require resets to match the next expected sequence number. RST segments that are in-window (but that do not match RCV.NXT) are ignored, and a "challenge ACK" is sent back. Main problem for conntrack is that its a middlebox, i.e. whereas an end host might have ACK'd SEQ (and would thus accept an RST with this sequence number), conntrack might not have seen this ACK (yet). Therefore we can't simply flag RSTs with non-exact match as invalid. This updates RST processing as follows: 1. If the connection is in a state other than ESTABLISHED, nothing is changed, RST is subject to normal in-window check. 2. If the RSTs sequence number either matches exactly RCV.NXT, connection state moves to CLOSE. 3. The same applies if the RST sequence number aligns with a previous packet in the same direction. In all other cases, the connection remains in ESTABLISHED state. If the normal-in-window check passes, the timeout will be lowered to that of CLOSE. If the peer sends a challenge ack, connection timeout will be reset. If the challenge ACK triggers another RST (RST was valid after all), this 2nd RST will match expected sequence and conntrack state changes to CLOSE. If no challenge ACK is received, the connection will time out after CLOSE seconds (10 seconds by default), just like without this patch. Packetdrill test case: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive a segment. 0.210 < P. 1:1001(1000) ack 1 win 46 0.210 > . 1:1(0) ack 1001 // Application writes 1000 bytes. 0.250 write(4, ..., 1000) = 1000 0.250 > P. 1:1001(1000) ack 1001 // First reset, old sequence. Conntrack (correctly) considers this // invalid due to failed window validation (regardless of this patch). 0.260 < R 2:2(0) ack 1001 win 260 // 2nd reset, but too far ahead sequence. Same: correctly handled // as invalid. 0.270 < R 99990001:99990001(0) ack 1001 win 260 // in-window, but not exact sequence. // Current Linux kernels might reply with a challenge ack, and do not // remove connection. // Without this patch, conntrack state moves to CLOSE. // With patch, timeout is lowered like CLOSE, but connection stays // in ESTABLISHED state. 0.280 < R 1010:1010(0) ack 1001 win 260 // Expect challenge ACK 0.281 > . 1001:1001(0) ack 1001 win 501 // With or without this patch, RST will cause connection // to move to CLOSE (sequence number matches) // 0.282 < R 1001:1001(0) ack 1001 win 260 // ACK 0.300 < . 1001:1001(0) ack 1001 win 257 // more data could be exchanged here, connection // is still established // Client closes the connection. 0.610 < F. 1001:1001(0) ack 1001 win 260 0.650 > . 1001:1001(0) ack 1002 // Close the connection without reading outstanding data 0.700 close(4) = 0 // so one more reset. Will be deemed acceptable with patch as well: // connection is already closing. 0.701 > R. 1001:1001(0) ack 1002 win 501 // End packetdrill test case. With patch, this generates following conntrack events: [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] Without patch, first RST moves connection to close, whereas socket state does not change until FIN is received. [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>