aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-01-10netfilter: nf_tables: don't use 'data_size' uninitializedLinus Torvalds1-0/+1
Commit 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") never initialized the new 'data_size' variable. I'm not sure how it ever worked, but it might have worked almost by accident - gcc seems to occasionally miss these kinds of 'variable used uninitialized' situations, but I've seen it do so because it ended up zero-initializing them due to some other simplification. But clang is very unhappy about it all, and correctly reports net/netfilter/nf_tables_api.c:8278:4: error: variable 'data_size' is uninitialized when used here [-Werror,-Wuninitialized] data_size += sizeof(*prule) + rule->dlen; ^~~~~~~~~ net/netfilter/nf_tables_api.c:8263:30: note: initialize the variable 'data_size' to silence this warning unsigned int size, data_size; ^ = 0 1 error generated. and this fix just initializes 'data_size' to zero before the loop. Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-0/+11
Merge in fixes directly in prep for the 5.17 merge window. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski29-285/+786
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. This includes one patch to update ovs and act_ct to use nf_ct_put() instead of nf_conntrack_put(). 1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet. 2) Remove redundant rcu read-size lock in nf_tables packet path. 3) Replace BUG() by WARN_ON_ONCE() in nft_payload. 4) Consolidate rule verdict tracing. 5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core. 6) Make counter support built-in in nf_tables. 7) Add new field to conntrack object to identify locally generated traffic, from Florian Westphal. 8) Prevent NAT from shadowing well-known ports, from Florian Westphal. 9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from Florian. 10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King. 11) Replace opencoded max() in conntrack, from Jiapeng Chong. 12) Update conntrack to use refcount_t API, from Florian Westphal. 13) Move ip_ct_attach indirection into the nf_ct_hook structure. 14) Constify several pointer object in the netfilter codebase, from Florian Westphal. 15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also from Florian. 16) Fix egress splat due to incorrect rcu notation, from Florian. 17) Move stateful fields of connlimit, last, quota, numgen and limit out of the expression data area. 18) Build a blob to represent the ruleset in nf_tables, this is a requirement of the new register tracking infrastructure. 19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers. 20) Add register tracking infrastructure to skip redundant store-to-register operations, this includes support for payload, meta and bitwise expresssions. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits) netfilter: nft_meta: cancel register tracking after meta update netfilter: nft_payload: cancel register tracking after payload update netfilter: nft_bitwise: track register operations netfilter: nft_meta: track register operations netfilter: nft_payload: track register operations netfilter: nf_tables: add register tracking infrastructure netfilter: nf_tables: add NFT_REG32_NUM netfilter: nf_tables: add rule blob layout netfilter: nft_limit: move stateful fields out of expression data netfilter: nft_limit: rename stateful structure netfilter: nft_numgen: move stateful fields out of expression data netfilter: nft_quota: move stateful fields out of expression data netfilter: nft_last: move stateful fields out of expression data netfilter: nft_connlimit: move stateful fields out of expression data netfilter: egress: avoid a lockdep splat net: prefer nf_ct_put instead of nf_conntrack_put netfilter: conntrack: avoid useless indirection during conntrack destruction netfilter: make function op structures const netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook netfilter: conntrack: convert to refcount_t api ... ==================== Link: https://lore.kernel.org/r/20220109231640.104123-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09netfilter: nft_meta: cancel register tracking after meta updatePablo Neira Ayuso1-0/+20
The meta expression might mangle the packet metadata, cancel register tracking since any metadata in the registers is stale. Finer grain register tracking cancellation by inspecting the meta type on the register is also possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_payload: cancel register tracking after payload updatePablo Neira Ayuso1-0/+21
The payload expression might mangle the packet, cancel register tracking since any payload data in the registers is stale. Finer grain register tracking cancellation by inspecting the payload base, offset and length on the register is also possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_bitwise: track register operationsPablo Neira Ayuso3-2/+97
Check if the destination register already contains the data that this bitwise expression performs. This allows to skip this redundant operation. If the destination contains a different bitwise operation, cancel the register tracking information. If the destination contains no bitwise operation, update the register tracking information. Update the payload and meta expression to check if this bitwise operation has been already performed on the register. Hence, both the payload/meta and the bitwise expressions are reduced. There is also a special case: If source register != destination register and source register is not updated by a previous bitwise operation, then transfer selector from the source register to the destination register. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_meta: track register operationsPablo Neira Ayuso1-0/+28
Check if the destination register already contains the data that this meta store expression performs. This allows to skip this redundant operation. If the destination contains a different selector, update the register tracking information. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_payload: track register operationsPablo Neira Ayuso1-0/+30
Check if the destination register already contains the data that this payload store expression performs. This allows to skip this redundant operation. If the destination contains a different selector, update the register tracking information. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nf_tables: add register tracking infrastructurePablo Neira Ayuso1-0/+11
This patch adds new infrastructure to skip redundant selector store operations on the same register to achieve a performance boost from the packet path. This is particularly noticeable in pure linear rulesets but it also helps in rulesets which are already heaving relying in maps to avoid ruleset linear inspection. The idea is to keep data of the most recurrent store operations on register to reuse them with cmp and lookup expressions. This infrastructure allows for dynamic ruleset updates since the ruleset blob reduction happens from the kernel. Userspace still needs to be updated to maximize register utilization to cooperate to improve register data reuse / reduce number of store on register operations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nf_tables: add rule blob layoutPablo Neira Ayuso3-63/+129
This patch adds a blob layout per chain to represent the ruleset in the packet datapath. size (unsigned long) struct nft_rule_dp struct nft_expr ... struct nft_rule_dp struct nft_expr ... struct nft_rule_dp (is_last=1) The new structure nft_rule_dp represents the rule in a more compact way (smaller memory footprint) compared to the control-plane nft_rule structure. The ruleset blob is a read-only data structure. The first field contains the blob size, then the rules containing expressions. There is a trailing rule which is used by the tracing infrastructure which is equivalent to the NULL rule marker in the previous representation. The blob size field does not include the size of this trailing rule marker. The ruleset blob is generated from the commit path. This patch reuses the infrastructure available since 0cbc06b3faba ("netfilter: nf_tables: remove synchronize_rcu in commit phase") to build the array of rules per chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_limit: move stateful fields out of expression dataPablo Neira Ayuso1-12/+82
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_limit: rename stateful structurePablo Neira Ayuso1-52/+52
From struct nft_limit to nft_limit_priv. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_numgen: move stateful fields out of expression dataPablo Neira Ayuso1-6/+28
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_quota: move stateful fields out of expression dataPablo Neira Ayuso1-5/+47
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_last: move stateful fields out of expression dataPablo Neira Ayuso1-18/+51
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_connlimit: move stateful fields out of expression dataPablo Neira Ayuso1-8/+18
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09net: prefer nf_ct_put instead of nf_conntrack_putFlorian Westphal1-2/+2
Its the same as nf_conntrack_put(), but without the need for an indirect call. The downside is a module dependency on nf_conntrack, but all of these already depend on conntrack anyway. Cc: Paul Blakey <paulb@mellanox.com> Cc: dev@openvswitch.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: conntrack: avoid useless indirection during conntrack destructionFlorian Westphal1-6/+6
nf_ct_put() results in a usesless indirection: nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock + indirect call of ct_hooks->destroy(). There are two _put helpers: nf_ct_put and nf_conntrack_put. The latter is what should be used in code that MUST NOT cause a linker dependency on the conntrack module (e.g. calls from core network stack). Everyone else should call nf_ct_put() instead. A followup patch will convert a few nf_conntrack_put() calls to nf_ct_put(), in particular from modules that already have a conntrack dependency such as act_ct or even nf_conntrack itself. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: make function op structures constFlorian Westphal5-14/+14
No functional changes, these structures should be const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: core: move ip_ct_attach indirection to struct nf_ct_hookFlorian Westphal2-14/+9
ip_ct_attach predates struct nf_ct_hook, we can place it there and remove the exported symbol. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: conntrack: convert to refcount_t apiFlorian Westphal8-27/+23
Convert nf_conn reference counting from atomic_t to refcount_t based api. refcount_t api provides more runtime sanity checks and will warn on certain constructs, e.g. refcount_inc() on a zero reference count, which usually indicates use-after-free. For this reason template allocation is changed to init the refcount to 1, the subsequenct add operations are removed. Likewise, init_conntrack() is changed to set the initial refcount to 1 instead refcount_inc(). This is safe because the new entry is not (yet) visible to other cpus. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06netfilter: nft_set_pipapo: allocate pcpu scratch maps on cloneFlorian Westphal1-0/+8
This is needed in case a new transaction is made that doesn't insert any new elements into an already existing set. Else, after second 'nft -f ruleset.txt', lookups in such a set will fail because ->lookup() encounters raw_cpu_ptr(m->scratch) == NULL. For the initial rule load, insertion of elements takes care of the allocation, but for rule reloads this isn't guaranteed: we might not have additions to the set. Fixes: 3c4287f62044a90e ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: etkaar <lists.netfilter.org@prvy.eu> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06netfilter: nft_payload: do not update layer 4 checksum when mangling fragmentsPablo Neira Ayuso1-0/+3
IP fragments do not come with the transport header, hence skip bogus layer 4 checksum updates. Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields") Reported-and-tested-by: Steffen Weinreich <steve@weinreich.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-04net/sched: act_ct: Fill offloading tuple iifidxPaul Blakey1-1/+5
Driver offloading ct tuples can use the information of which devices received the packets that created the offloaded connections, to more efficiently offload them only to the relevant device. Add new act_ct nf conntrack extension, which is used to store the skb devices before offloading the connection, and then fill in the tuple iifindex so drivers can get the device via metadata dissector match. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-0/+2
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski2-0/+2
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-24netfilter: nft_set_pipapo_avx2: remove redundant pointer ltColin Ian King1-3/+1
The pointer lt is being assigned a value and then later updated but that value is never read. The pointer is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-6/+9
include/net/sock.h commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules") commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23netfilter: flowtable: remove ipv4/ipv6 modulesFlorian Westphal1-0/+26
Just place the structs and registration in the inet module. nf_flow_table_ipv6, nf_flow_table_ipv4 and nf_flow_table_inet share same module dependencies: nf_flow_table, nf_tables. before: text data bss dec hex filename 2278 1480 0 3758 eae nf_flow_table_inet.ko 1159 1352 0 2511 9cf nf_flow_table_ipv6.ko 1154 1352 0 2506 9ca nf_flow_table_ipv4.ko after: 2369 1672 0 4041 fc9 nf_flow_table_inet.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nat: force port remap to prevent shadowing well-known portsFlorian Westphal1-3/+40
If destination port is above 32k and source port below 16k assume this might cause 'port shadowing' where a 'new' inbound connection matches an existing one, e.g. inbound X:41234 -> Y:53 matches existing conntrack entry Z:53 -> X:4123, where Z got natted to X. In this case, new packet is natted to Z:53 which is likely unwanted. We avoid the rewrite for connections that originate from local host: port-shadowing is only possible with forwarded connections. Also adjust test case. v3: no need to call tuple_force_port_remap if already in random mode (Phil) Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Acked-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: conntrack: tag conntracks picked up in local out hookFlorian Westphal1-0/+3
This allows to identify flows that originate from local machine in a followup patch. It would be possible to make this a ->status bit instead. For now I did not do that yet because I don't have a use-case for exposing this info to userspace. If one comes up the toggle can be replaced with a status bit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nf_tables: make counter support built-inPablo Neira Ayuso4-51/+21
Make counter support built-in to allow for direct call in case of CONFIG_RETPOLINE. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nf_tables: replace WARN_ON by WARN_ON_ONCE for unknown verdictsPablo Neira Ayuso1-1/+1
Bug might trigger warning for each packet, call WARN_ON_ONCE instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nf_tables: consolidate rule verdict trace callPablo Neira Ayuso1-7/+32
Add function to consolidate verdict tracing. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nft_payload: WARN_ON_ONCE instead of BUGPablo Neira Ayuso1-2/+4
BUG() is too harsh for unknown payload base, use WARN_ON_ONCE() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nf_tables: remove rcu read-size lockPablo Neira Ayuso1-2/+0
Chain stats are updated from the Netfilter hook path which already run under rcu read-size lock section. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: ctnetlink: remove expired entries firstFlorian Westphal1-2/+3
When dumping conntrack table to userspace via ctnetlink, check if the ct has already expired before doing any of the 'skip' checks. This expires dead entries faster. /proc handler also removes outdated entries first. Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: nf_nat_masquerade: add netns refcount tracker to masq_dev_workEric Dumazet1-1/+3
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in iterate_cleanup_work() and netns_tracker_alloc() in nf_nat_masq_schedule() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: nfnetlink: add netns refcount tracker to struct nfulnl_instanceEric Dumazet1-2/+3
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in nfulnl_instance_free_rcu() and get_net_track() in instance_create() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski5-18/+9
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, mostly rather small housekeeping patches: 1) Remove unused variable in IPVS, from GuoYong Zheng. 2) Use memset_after in conntrack, from Kees Cook. 3) Remove leftover function in nfnetlink_queue, from Florian Westphal. 4) Remove redundant test on bool in conntrack, from Bernard Zhao. 5) egress support for nft_fwd, from Lukas Wunner. 6) Make pppoe work for br_netfilter, from Florian Westphal. 7) Remove unused variable in conntrack resize routine, from luo penghao. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: conntrack: Remove useless assignment statements netfilter: bridge: add support for pppoe filtering netfilter: nft_fwd_netdev: Support egress hook netfilter: ctnetlink: remove useless type conversion to bool netfilter: nf_queue: remove leftover synchronize_rcu netfilter: conntrack: Use memset_startat() to zero struct nf_conn ipvs: remove unused variable for ip_vs_new_dest ==================== Link: https://lore.kernel.org/r/20211215234911.170741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16netfilter: conntrack: Remove useless assignment statementsluo penghao1-1/+0
The old_size assignment here will not be used anymore The clang_analyzer complains as follows: Value stored to 'old_size' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: fix regression in looped (broad|multi)cast's MAC handlingIgnacy Gawędzki2-2/+4
In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared"), the test for non-empty MAC header introduced in commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has been replaced with a test for a set MAC header. This breaks the case when the MAC header has been reset (using skb_reset_mac_header), as is the case with looped-back multicast packets. As a result, the packets ending up in NFQUEUE get a bogus hwaddr interpreted from the first bytes of the IP header. This patch adds a test for a non-empty MAC header in addition to the test for a set MAC header. The same two tests are also implemented in nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has not been touched, but where supposedly the same situation may happen. Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared") Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-15netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy()Eric Dumazet1-2/+2
We need to use list_for_each_entry_safe() iterator because we can not access @catchall after kfree_rcu() call. syzbot reported: BUG: KASAN: use-after-free in nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] BUG: KASAN: use-after-free in nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] BUG: KASAN: use-after-free in nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 Read of size 8 at addr ffff8880716e5b80 by task syz-executor.3/8871 CPU: 1 PID: 8871 Comm: syz-executor.3 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x2ed mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f75fbf28adb Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44 RSP: 002b:00007ffd8da7ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f75fbf28adb RDX: 00007f75fc08e828 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 00007f75fc08a960 R08: 0000000000000000 R09: 00007f75fc08e830 R10: 00007ffd8da7ed10 R11: 0000000000000293 R12: 00000000002067c3 R13: 00007ffd8da7ed10 R14: 00007f75fc088f60 R15: 0000000000000032 </TASK> Allocated by task 8886: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] ____kasan_kmalloc mm/kasan/common.c:513 [inline] ____kasan_kmalloc mm/kasan/common.c:472 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:522 kasan_kmalloc include/linux/kasan.h:269 [inline] kmem_cache_alloc_trace+0x1ea/0x4a0 mm/slab.c:3575 kmalloc include/linux/slab.h:590 [inline] nft_setelem_catchall_insert net/netfilter/nf_tables_api.c:5544 [inline] nft_setelem_insert net/netfilter/nf_tables_api.c:5562 [inline] nft_add_set_elem+0x232e/0x2f40 net/netfilter/nf_tables_api.c:5936 nf_tables_newsetelem+0x6ff/0xbb0 net/netfilter/nf_tables_api.c:6032 nfnetlink_rcv_batch+0x1710/0x25f0 net/netfilter/nfnetlink.c:513 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 15335: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xd1/0x110 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:235 [inline] __cache_free mm/slab.c:3445 [inline] kmem_cache_free_bulk+0x67/0x1e0 mm/slab.c:3766 kfree_bulk include/linux/slab.h:446 [inline] kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3273 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Last potentially related work creation: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 __kasan_record_aux_stack+0xb5/0xe0 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3550 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4489 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x34a/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff8880716e5b80 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff8880716e5b80, ffff8880716e5bc0) The buggy address belongs to the page: page:ffffea0001c5b940 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880716e5c00 pfn:0x716e5 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0000911848 ffffea00007c4d48 ffff888010c40200 raw: ffff8880716e5c00 ffff8880716e5000 000000010000001e 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x242040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 3638, ts 211086074437, free_ts 211031029429 prep_new_page mm/page_alloc.c:2418 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 __alloc_pages_node include/linux/gfp.h:570 [inline] kmem_getpages mm/slab.c:1377 [inline] cache_grow_begin+0x75/0x470 mm/slab.c:2593 cache_alloc_refill+0x27f/0x380 mm/slab.c:2965 ____cache_alloc mm/slab.c:3048 [inline] ____cache_alloc mm/slab.c:3031 [inline] __do_cache_alloc mm/slab.c:3275 [inline] slab_alloc mm/slab.c:3316 [inline] __do_kmalloc mm/slab.c:3700 [inline] __kmalloc+0x3b3/0x4d0 mm/slab.c:3711 kmalloc include/linux/slab.h:595 [inline] kzalloc include/linux/slab.h:724 [inline] tomoyo_get_name+0x234/0x480 security/tomoyo/memory.c:173 tomoyo_parse_name_union+0xbc/0x160 security/tomoyo/util.c:260 tomoyo_update_path_number_acl security/tomoyo/file.c:687 [inline] tomoyo_write_file+0x629/0x7f0 security/tomoyo/file.c:1034 tomoyo_write_domain2+0x116/0x1d0 security/tomoyo/common.c:1152 tomoyo_add_entry security/tomoyo/common.c:2042 [inline] tomoyo_supervisor+0xbc7/0xf00 security/tomoyo/common.c:2103 tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline] tomoyo_path_number_perm+0x419/0x590 security/tomoyo/file.c:734 security_file_ioctl+0x50/0xb0 security/security.c:1541 __do_sys_ioctl fs/ioctl.c:868 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0xb3/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1338 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389 free_unref_page_prepare mm/page_alloc.c:3309 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3388 slab_destroy mm/slab.c:1627 [inline] slabs_destroy+0x89/0xc0 mm/slab.c:1647 cache_flusharray mm/slab.c:3418 [inline] ___cache_free+0x4cc/0x610 mm/slab.c:3480 qlink_free mm/kasan/quarantine.c:146 [inline] qlist_free_all+0x4e/0x110 mm/kasan/quarantine.c:165 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272 __kasan_slab_alloc+0x97/0xb0 mm/kasan/common.c:444 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slab.c:3261 [inline] kmem_cache_alloc_node+0x2ea/0x590 mm/slab.c:3599 __alloc_skb+0x215/0x340 net/core/skbuff.c:414 alloc_skb include/linux/skbuff.h:1126 [inline] nlmsg_new include/net/netlink.h:953 [inline] rtmsg_ifinfo_build_skb+0x72/0x1a0 net/core/rtnetlink.c:3808 rtmsg_ifinfo_event net/core/rtnetlink.c:3844 [inline] rtmsg_ifinfo_event net/core/rtnetlink.c:3835 [inline] rtmsg_ifinfo+0x83/0x120 net/core/rtnetlink.c:3853 netdev_state_change net/core/dev.c:1395 [inline] netdev_state_change+0x114/0x130 net/core/dev.c:1386 linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:167 __linkwatch_run_queue+0x233/0x6a0 net/core/link_watch.c:213 linkwatch_event+0x4a/0x60 net/core/link_watch.c:252 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 Memory state around the buggy address: ffff8880716e5a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc >ffff8880716e5b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff8880716e5c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-12/+15
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08netfilter: conntrack: annotate data-races around ct->timeoutEric Dumazet3-6/+6
(struct nf_conn)->timeout can be read/written locklessly, add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing. BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0: __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push_pending_frames include/net/tcp.h:1897 [inline] tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] __sys_sendto+0x21e/0x2c0 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1: nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline] __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline] tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367 rds_send_worker+0x43/0x200 net/rds/threads.c:200 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 worker_thread+0x616/0xa70 kernel/workqueue.c:2445 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00027cc2 -> 0x00000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G W 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: krdsd rds_send_worker Note: I chose an arbitrary commit for the Fixes: tag, because I do not think we need to backport this fix to very old kernels. Fixes: e37542ba111f ("netfilter: conntrack: avoid possible false sharing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08netfilter: nft_exthdr: break evaluation if setting TCP option failsPablo Neira Ayuso1-4/+7
Break rule evaluation on malformed TCP options. Fixes: 99d1712bc41c ("netfilter: exthdr: tcp option set support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groupsStefano Brivio1-1/+1
The sixth byte of packet data has to be looked up in the sixth group, not in the seventh one, even if we load the bucket data into ymm6 (and not ymm5, for convenience of tracking stalls). Without this fix, matching on a MAC address as first field of a set, if 8-bit groups are selected (due to a small set size) would fail, that is, the given MAC address would never match. Reported-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-By: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-30netfilter: nft_fwd_netdev: Support egress hookPablo Neira Ayuso1-2/+5
Allow packet redirection to another interface upon egress. [lukas: set skb_iif, add commit message, original patch from Pablo. ] Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-30netfilter: nfnetlink_queue: silence bogus compiler warningFlorian Westphal1-1/+1
net/netfilter/nfnetlink_queue.c:601:36: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0) ctinfo is only uninitialized if ct == NULL. Init it to 0 to silence this. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-30netfilter: ctnetlink: remove useless type conversion to boolBernard Zhao1-1/+1
dying is bool, the type conversion to true/false value is not needed. Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>