aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2022-07-21net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookupKumar Kartikeya Dwivedi1-34/+18
Move common checks inside the common function, and maintain the only difference the two being how to obtain the struct net * from ctx. No functional change intended. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21bpf: Switch to new kfunc flags infrastructureKumar Kartikeya Dwivedi1-37/+10
Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski31-260/+385
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21netfilter: xt_TPROXY: remove pr_debug invocationsJustin Stitt1-23/+2
pr_debug calls are no longer needed in this file. Pablo suggested "a patch to remove these pr_debug calls". This patch has some other beneficial collateral as it also silences multiple Clang -Wformat warnings that were present in the pr_debug calls. diff from v1 -> v2: * converted if statement one-liner style * x == NULL is now !x Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21netfilter: flowtable: prefer refcount_incFlorian Westphal1-8/+3
With refcount_inc_not_zero, we'd also need a smp_rmb or similar, followed by a test of the CONFIRMED bit. However, the ct pointer is taken from skb->_nfct, its refcount must not be 0 (else, we'd already have a use-after-free bug). Use refcount_inc() instead to clarify the ct refcount is expected to be at least 1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21netfilter: ipvs: Use the bitmap API to allocate bitmapsChristophe JAILLET1-3/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-15ip: Fix data-races around sysctl_ip_default_ttl.Kuniyuki Iwashima1-1/+1
While reading sysctl_ip_default_ttl, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11netfilter: nf_tables: move nft_cmp_fast_mask to where its usedFlorian Westphal1-0/+12
... and cast result to u32 so sparse won't complain anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_tables: use correct integer typesFlorian Westphal4-11/+12
Sparse tool complains about mixing of different endianess types, so use the correct ones. Add type casts where needed. objdiff shows no changes except in nft_tunnel (type is changed). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_tables: add and use BE register load-store helpersFlorian Westphal1-3/+3
Same as the existing ones, no conversions. This is just for sparse sake only so that we no longer mix be16/u16 and be32/u32 types. Alternative is to add __force __beX in various places, but this seems nicer. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_tables: use the correct get/put helpersFlorian Westphal4-10/+11
Switch to be16/32 and u16/32 respectively. No code changes here, the functions do the same thing, this is just for sparse checkers' sake. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: x_tables: use correct integer typesFlorian Westphal3-9/+9
Sparse complains because __be32 and u32 are mixed without conversions. Use the correct types, no code changes. Furthermore, xt_DSCP generates a bit truncation warning: "cast truncates bits from constant value (ffffff03 becomes 3)" The truncation is fine (and wanted). Add a private definition and use that instead. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nfnetlink: add missing __be16 castFlorian Westphal1-1/+1
Sparse flags this as suspicious, because this compares integer with a be16 with no conversion. Its a compat check for old userspace that sends host byte order, so force a be16 cast here. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nft_set_bitmap: Fix spelling mistakeZhang Jiaming1-2/+2
Change 'succesful' to 'successful'. Change 'transation' to 'transaction'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: h323: merge nat hook pointers into oneFlorian Westphal1-161/+99
sparse complains about incorrect rcu usage. Code uses the correct rcu access primitives, but the function pointers lack rcu annotations. Collapse all of them into a single structure, then annotate the pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_conntrack: use rcu accessors where neededFlorian Westphal7-16/+57
Sparse complains about direct access to the 'helper' and timeout members. Both have __rcu annotation, so use the accessors. xt_CT is fine, accesses occur before the structure is visible to other cpus. Switch to rcu accessors there as well to reduce noise. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_conntrack: add missing __rcu annotationsFlorian Westphal3-3/+3
Access to the hook pointers use correct helpers but the pointers lack the needed __rcu annotation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: nf_flow_table: count pending offload workqueue tasksVlad Buslov5-4/+165
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate. Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: conntrack: use correct format charactersBill Wendling1-1/+1
When compiling with -Wformat, clang emits the following warnings: net/netfilter/nf_conntrack_helper.c:168:18: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security] request_module(mod_name); ^~~~~~~~ Use a string literal for the format string. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <isanbard@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11netfilter: conntrack: use fallthrough to cleanupJackie Liu1-5/+3
These cases all use the same function. we can simplify the code through fallthrough. $ size net/netfilter/nf_conntrack_core.o text data bss dec hex filename before 81601 81430 768 163799 27fd7 net/netfilter/nf_conntrack_core.o after 80361 81430 768 162559 27aff net/netfilter/nf_conntrack_core.o Arch: aarch64 Gcc : gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09netfilter: nf_tables: replace BUG_ON by element length checkPablo Neira Ayuso1-21/+51
BUG_ON can be triggered from userspace with an element with a large userdata area. Replace it by length check and return EINVAL instead. Over time extensions have been growing in size. Pick a sufficiently old Fixes: tag to propagate this fix. Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09netfilter: nf_log: incorrect offset to network headerPablo Neira Ayuso1-4/+4
NFPROTO_ARP is expecting to find the ARP header at the network offset. In the particular case of ARP, HTYPE= field shows the initial bytes of the ethernet header destination MAC address. netdev out: IN= OUT=bridge0 MACSRC=c2:76:e5:71:e1:de MACDST=36:b0:4a:e2:72:ea MACPROTO=0806 ARP HTYPE=14000 PTYPE=0x4ae2 OPCODE=49782 NFPROTO_NETDEV egress hook is also expecting to find the IP headers at the network offset. Fixes: 35b9395104d5 ("netfilter: add generic ARP packet logger") Reported-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-07netfilter: conntrack: fix crash due to confirmed bit load reorderingFlorian Westphal3-0/+26
Kajetan Puchalski reports crash on ARM, with backtrace of: __nf_ct_delete_from_lists nf_ct_delete early_drop __nf_conntrack_alloc Unlike atomic_inc_not_zero, refcount_inc_not_zero is not a full barrier. conntrack uses SLAB_TYPESAFE_BY_RCU, i.e. it is possible that a 'newly' allocated object is still in use on another CPU: CPU1 CPU2 encounter 'ct' during hlist walk delete_from_lists refcount drops to 0 kmem_cache_free(ct); __nf_conntrack_alloc() // returns same object refcount_inc_not_zero(ct); /* might fail */ /* If set, ct is public/in the hash table */ test_bit(IPS_CONFIRMED_BIT, &ct->status); In case CPU1 already set refcount back to 1, refcount_inc_not_zero() will succeed. The expected possibilities for a CPU that obtained the object 'ct' (but no reference so far) are: 1. refcount_inc_not_zero() fails. CPU2 ignores the object and moves to the next entry in the list. This happens for objects that are about to be free'd, that have been free'd, or that have been reallocated by __nf_conntrack_alloc(), but where the refcount has not been increased back to 1 yet. 2. refcount_inc_not_zero() succeeds. CPU2 checks the CONFIRMED bit in ct->status. If set, the object is public/in the table. If not, the object must be skipped; CPU2 calls nf_ct_put() to un-do the refcount increment and moves to the next object. Parallel deletion from the hlists is prevented by a 'test_and_set_bit(IPS_DYING_BIT, &ct->status);' check, i.e. only one cpu will do the unlink, the other one will only drop its reference count. Because refcount_inc_not_zero is not a full barrier, CPU2 may try to delete an object that is not on any list: 1. refcount_inc_not_zero() successful (refcount inited to 1 on other CPU) 2. CONFIRMED test also successful (load was reordered or zeroing of ct->status not yet visible) 3. delete_from_lists unlinks entry not on the hlist, because IPS_DYING_BIT is 0 (already cleared). 2) is already wrong: CPU2 will handle a partially initited object that is supposed to be private to CPU1. Add needed barriers when refcount_inc_not_zero() is successful. It also inserts a smp_wmb() before the refcount is set to 1 during allocation. Because other CPU might still see the object, refcount_set(1) "resurrects" it, so we need to make sure that other CPUs will also observe the right content. In particular, the CONFIRMED bit test must only pass once the object is fully initialised and either in the hash or about to be inserted (with locks held to delay possible unlink from early_drop or gc worker). I did not change flow_offload_alloc(), as far as I can see it should call refcount_inc(), not refcount_inc_not_zero(): the ct object is attached to the skb so its refcount should be >= 1 in all cases. v2: prefer smp_acquire__after_ctrl_dep to smp_rmb (Will Deacon). v3: keep smp_acquire__after_ctrl_dep close to refcount_inc_not_zero call add comment in nf_conntrack_netlink, no control dependency there due to locks. Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/Yr7WTfd6AVTQkLjI@e126311.manchester.arm.com/ Reported-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Diagnosed-by: Will Deacon <will@kernel.org> Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Will Deacon <will@kernel.org>
2022-07-02netfilter: nft_set_pipapo: release elements in clone from abort pathPablo Neira Ayuso1-15/+33
New elements that reside in the clone are not released in case that the transaction is aborted. [16302.231754] ------------[ cut here ]------------ [16302.231756] WARNING: CPU: 0 PID: 100509 at net/netfilter/nf_tables_api.c:1864 nf_tables_chain_destroy+0x26/0x127 [nf_tables] [...] [16302.231882] CPU: 0 PID: 100509 Comm: nft Tainted: G W 5.19.0-rc3+ #155 [...] [16302.231887] RIP: 0010:nf_tables_chain_destroy+0x26/0x127 [nf_tables] [16302.231899] Code: f3 fe ff ff 41 55 41 54 55 53 48 8b 6f 10 48 89 fb 48 c7 c7 82 96 d9 a0 8b 55 50 48 8b 75 58 e8 de f5 92 e0 83 7d 50 00 74 09 <0f> 0b 5b 5d 41 5c 41 5d c3 4c 8b 65 00 48 8b 7d 08 49 39 fc 74 05 [...] [16302.231917] Call Trace: [16302.231919] <TASK> [16302.231921] __nf_tables_abort.cold+0x23/0x28 [nf_tables] [16302.231934] nf_tables_abort+0x30/0x50 [nf_tables] [16302.231946] nfnetlink_rcv_batch+0x41a/0x840 [nfnetlink] [16302.231952] ? __nla_validate_parse+0x48/0x190 [16302.231959] nfnetlink_rcv+0x110/0x129 [nfnetlink] [16302.231963] netlink_unicast+0x211/0x340 [16302.231969] netlink_sendmsg+0x21e/0x460 Add nft_set_pipapo_match_destroy() helper function to release the elements in the lookup tables. Stefano Brivio says: "We additionally look for elements pointers in the cloned matching data if priv->dirty is set, because that means that cloned data might point to additional elements we did not commit to the working copy yet (such as the abort path case, but perhaps not limited to it)." Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-02netfilter: nf_tables: stricter validation of element dataPablo Neira Ayuso1-1/+8
Make sure element data type and length do not mismatch the one specified by the set declaration. Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data") Reported-by: Hugues ANGUELKOV <hanguelkov@randorisec.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-27netfilter: nf_tables: avoid skb access on nf_stolenFlorian Westphal2-23/+45
When verdict is NF_STOLEN, the skb might have been freed. When tracing is enabled, this can result in a use-after-free: 1. access to skb->nf_trace 2. access to skb->mark 3. computation of trace id 4. dump of packet payload To avoid 1, keep a cached copy of skb->nf_trace in the trace state struct. Refresh this copy whenever verdict is != STOLEN. Avoid 2 by skipping skb->mark access if verdict is STOLEN. 3 is avoided by precomputing the trace id. Only dump the packet when verdict is not "STOLEN". Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-27netfilter: nft_dynset: restore set element counter when failing to updatePablo Neira Ayuso1-0/+2
This patch fixes a race condition. nft_rhash_update() might fail for two reasons: - Element already exists in the hashtable. - Another packet won race to insert an entry in the hashtable. In both cases, new() has already bumped the counter via atomic_add_unless(), therefore, decrement the set element counter. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-21netfilter: nf_dup_netdev: add and use recursion counterFlorian Westphal1-4/+15
Now that the egress function can be called from egress hook, we need to avoid recursive calls into the nf_tables traverser, else crash. Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-21netfilter: nf_dup_netdev: do not push mac header a second timeFlorian Westphal1-4/+10
Eric reports skb_under_panic when using dup/fwd via bond+egress hook. Before pushing mac header, we should make sure that we're called from ingress to put back what was pulled earlier. In egress case, the MAC header is already there; we should leave skb alone. While at it be more careful here: skb might have been altered and headroom reduced, so add a skb_cow() before so that headroom is increased if necessary. nf_do_netdev_egress() assumes skb ownership (it normally ends with a call to dev_queue_xmit), so we must free the packet on error. Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook") Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-17netfilter: cttimeout: fix slab-out-of-bounds read typo in cttimeout_net_exitFlorian Westphal1-1/+1
syzbot reports: BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42 [..] list_del include/linux/list.h:148 [inline] cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617 Problem is the wrong name of the list member, so container_of() result is wrong. Reported-by: <syzbot+92968395eedbdbd3617d@syzkaller.appspotmail.com> Fixes: 78222bacfca9 ("netfilter: cttimeout: decouple unlink and free on netns destruction") Signed-off-by: Florian Westphal <fw@strlen.de>
2022-06-08netfilter: use get_random_u32 instead of prandomFlorian Westphal2-20/+5
bh might occur while updating per-cpu rnd_state from user context, ie. local_out path. BUG: using smp_processor_id() in preemptible [00000000] code: nginx/2725 caller is nft_ng_random_eval+0x24/0x54 [nft_numgen] Call Trace: check_preemption_disabled+0xde/0xe0 nft_ng_random_eval+0x24/0x54 [nft_numgen] Use the random driver instead, this also avoids need for local prandom state. Moreover, prandom now uses the random driver since d4150779e60f ("random32: use real rng for non-deterministic randomness"). Based on earlier patch from Pablo Neira. Fixes: 6b2faee0ca91 ("netfilter: nft_meta: place prandom handling in a helper") Fixes: 978d8f9055c3 ("netfilter: nft_numgen: add map lookups for numgen random operations") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-06netfilter: nf_tables: bail out early if hardware offload is not supportedPablo Neira Ayuso2-2/+23
If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device provides the .ndo_setup_tc interface or there is an indirect flow block that has been registered. Otherwise, bail out early from the preparation phase. Moreover, validate that family == NFPROTO_NETDEV and hook is NF_NETDEV_INGRESS. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-06netfilter: nf_tables: memleak flow rule from commit pathPablo Neira Ayuso1-0/+6
Abort path release flow rule object, however, commit path does not. Update code to destroy these objects before releasing the transaction. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-06netfilter: nf_tables: release new hooks on unsupported flowtable flagsPablo Neira Ayuso1-4/+8
Release the list of new hooks that are pending to be registered in case that unsupported flowtable flags are provided. Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-02netfilter: nf_tables: always initialize flowtable hook list in transactionPablo Neira Ayuso1-0/+1
The hook list is used if nft_trans_flowtable_update(trans) == true. However, initialize this list for other cases for safety reasons. Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-02netfilter: nf_tables: delete flowtable hooks via transaction listPablo Neira Ayuso1-25/+6
Remove inactive bool field in nft_hook object that was introduced in abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable"). Move stale flowtable hooks to transaction list instead. Deleting twice the same device does not result in ENOENT. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-01netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net pathPablo Neira Ayuso1-1/+1
Use kfree_rcu(ptr, rcu) variant instead as described by ae089831ff28 ("netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant"). Fixes: f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-01netfilter: nat: really support inet nat without l3 addressFlorian Westphal1-1/+2
When no l3 address is given, priv->family is set to NFPROTO_INET and the evaluation function isn't called. Call it too so l4-only rewrite can work. Also add a test case for this. Fixes: a33f387ecd5aa ("netfilter: nft_nat: allow to specify layer 4 protocol NAT only") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-31netfilter: flowtable: fix nft_flow_route source address for nat casewenxu1-2/+2
For snat and dnat cases, the saddr should be taken from reverse tuple. Fixes: 3412e1641828 (netfilter: flowtable: nft_flow_route use more data for reverse route) Signed-off-by: wenxu <wenxu@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-31netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flagwenxu1-0/+2
The nf_flow_table gets route through ip_route_output_key. If the saddr is not local one, then FLOWI_FLAG_ANYSRC flag should be set. Without this flag, the route lookup for other_dst will fail. Fixes: 3412e1641828 (netfilter: flowtable: nft_flow_route use more data for reverse route) Signed-off-by: wenxu <wenxu@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-31netfilter: nf_tables: double hook unregistration in netns pathPablo Neira Ayuso1-13/+41
__nft_release_hooks() is called from pre_netns exit path which unregisters the hooks, then the NETDEV_UNREGISTER event is triggered which unregisters the hooks again. [ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270 [...] [ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27 [ 565.253682] Workqueue: netns cleanup_net [ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270 [...] [ 565.297120] Call Trace: [ 565.300900] <TASK> [ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables] [ 565.308518] raw_notifier_call_chain+0x63/0x80 [ 565.312386] unregister_netdevice_many+0x54f/0xb50 Unregister and destroy netdev hook from netns pre_exit via kfree_rcu so the NETDEV_UNREGISTER path see unregistered hooks. Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-31netfilter: nf_tables: hold mutex on netns pre_exit pathPablo Neira Ayuso1-0/+4
clean_net() runs in workqueue while walking over the lists, grab mutex. Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-31netfilter: nf_tables: sanitize nft_set_desc_concat_parse()Pablo Neira Ayuso1-4/+13
Add several sanity checks for nft_set_desc_concat_parse(): - validate desc->field_count not larger than desc->field_len array. - field length cannot be larger than desc->field_len (ie. U8_MAX) - total length of the concatenation cannot be larger than register array. Joint work with Florian Westphal. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reported-by: <zhangziming.zzm@antgroup.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: nf_tables: set element extended ACK reporting supportPablo Neira Ayuso1-3/+9
Report the element that causes problems via netlink extended ACK for set element commands. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exitFlorian Westphal1-2/+3
syzbot reports: BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42 [..] list_del include/linux/list.h:148 [inline] cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617 No reproducer so far. Looking at recent changes in this area its clear that the free_head must not be at the end of the structure because nf_ct_timeout structure has variable size. Reported-by: <syzbot+92968395eedbdbd3617d@syzkaller.appspotmail.com> Fixes: 78222bacfca9 ("netfilter: cttimeout: decouple unlink and free on netns destruction") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: nfnetlink: fix warn in nfnetlink_unbindFlorian Westphal1-19/+5
syzbot reports following warn: WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694 The syzbot generated program does this: socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3 setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0 ... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check. Instead of counting, just enable reporting for every bind request and check if we still have listeners on unbind. While at it, also add the needed bounds check on nfnl_group2type[] access. Reported-by: <syzbot+4903218f7fba0a2d6226@syzkaller.appspotmail.com> Reported-by: <syzbot+afd2d80e495f96049571@syzkaller.appspotmail.com> Fixes: 2794cdb0b97b ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-26netfilter: nft_limit: Clone packet limits' cost valuePhil Sutter1-0/+2
When cloning a packet-based limit expression, copy the cost value as well. Otherwise the new limit is not functional anymore. Fixes: 3b9e2ea6c11bf ("netfilter: nft_limit: move stateful fields out of expression data") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-26netfilter: nf_tables: disallow non-stateful expression in sets earlierPablo Neira Ayuso1-9/+10
Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements. cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics. nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective. This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called. The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting. For the record, this is the KASAN splat. [ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams <edg-e@nccgroup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski3-48/+17
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, misc updates and fallout fixes from recent Florian's code rewritting (from last pull request): 1) Use new flowi4_l3mdev field in ip_route_me_harder(), from Martin Willi. 2) Avoid unnecessary GC with a timestamp in conncount, from William Tu and Yifeng Sun. 3) Remove TCP conntrack debugging, from Florian Westphal. 4) Fix compilation warning in ctnetlink, from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: ctnetlink: fix up for "netfilter: conntrack: remove unconfirmed list" netfilter: conntrack: remove pr_debug callsites from tcp tracker netfilter: nf_conncount: reduce unnecessary GC netfilter: Use l3mdev flow key when re-routing mangled packets ==================== Link: https://lore.kernel.org/r/20220519220206.722153-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>