linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-02-12	netfilter: compat: initialize all fields in xt_init	Francesco Ruggeri	1	-1/+1
	If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init time, the following panic can be caused by running % ebtables -t broute -F BROUTING from a 32-bit user level on a 64-bit kernel. This patch replaces kmalloc_array with kcalloc when allocating xt. [ 474.680846] BUG: unable to handle kernel paging request at 0000000009600920 [ 474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0 [ 474.693838] Oops: 0000 [#1] SMP [ 474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1 [ 474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013 [ 474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables] [ 474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d [ 474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207 [ 474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249 [ 474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124 [ 474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f [ 474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001 [ 474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8 [ 474.780234] FS: 0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700 [ 474.788612] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0 [ 474.802052] Call Trace: [ 474.804789] compat_do_replace+0x1fb/0x2a3 [ebtables] [ 474.810105] compat_do_ebt_set_ctl+0x69/0xe6 [ebtables] [ 474.815605] ? try_module_get+0x37/0x42 [ 474.819716] compat_nf_setsockopt+0x4f/0x6d [ 474.824172] compat_ip_setsockopt+0x7e/0x8c [ 474.828641] compat_raw_setsockopt+0x16/0x3a [ 474.833220] compat_sock_common_setsockopt+0x1d/0x24 [ 474.838458] __compat_sys_setsockopt+0x17e/0x1b1 [ 474.843343] ? __check_object_size+0x76/0x19a [ 474.847960] __ia32_compat_sys_socketcall+0x1cb/0x25b [ 474.853276] do_fast_syscall_32+0xaf/0xf6 [ 474.857548] entry_SYSENTER_compat+0x6b/0x7a Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	7	-115/+108
	An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-05	netfilter: nft_compat: don't use refcount_inc on newly allocated entry	Florian Westphal	1	-39/+23
	When I moved the refcount to refcount_t type I missed the fact that refcount_inc() will result in use-after-free warning with CONFIG_REFCOUNT_FULL=y builds. The correct fix would be to init the reference count to 1 at allocation time, but, unfortunately we cannot do this, as we can't undo that in case something else fails later in the batch. So only solution I see is to special-case the 'new entry' condition and replace refcount_inc() with a "delayed" refcount_set(1) in this case, as done here. The .activate callback can be removed to simplify things, we only need to make sure that deactivate() decrements/unlinks the entry from the list at end of transaction phase (commit or abort). Fixes: 12c44aba6618 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count") Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04	netfilter: ipv6: avoid indirect calls for IPV6=y case	Florian Westphal	2	-15/+7
	indirect calls are only needed if ipv6 is a module. Add helpers to abstract the v6ops indirections and use them instead. fragment, reroute and route_input are kept as indirect calls. The first two are not not used in hot path and route_input is only used by bridge netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04	netfilter: nf_tables: unbind set in rule from commit path	Pablo Neira Ayuso	6	-79/+72
	Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04	netfilter: nft_tunnel: Add NFTA_TUNNEL_MODE options	wenxu	1	-2/+32
	nft "tunnel" expr match both the tun_info of RX and TX. This patch provide the NFTA_TUNNEL_MODE to individually match the tun_info of RX or TX. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04	netfilter: nf_nat: skip nat clash resolution for same-origin entries	Martynas Pumputis	1	-0/+16
	It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. Packet A and B are sent at the same time from two different threads by same UDP socket. No matching conntrack entry exists yet. Both packets cause allocation of a new conntrack entry. 2. get_unique_tuple gets called for A. No clashing entry found. conntrack entry for A is added to main conntrack table. 3. get_unique_tuple is called for B and will find that the reply tuple of B is already taken by A. It will allocate a new UDP source port for B to resolve the clash. 4. conntrack entry for B cannot be added to main conntrack table because its ORIGINAL direction is clashing with A and the REPLY directions of A and B are not the same anymore due to UDP source port reallocation done in step 3. This patch modifies nf_conntrack_tuple_taken so it doesn't consider colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal. [ Florian: simplify patch to not use .allow_clash setting and always ignore identical flows ] Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	3	-47/+158

2019-01-29	netfilter: nf_tables: add NFTA_RULE_POSITION_ID to nla_policy	Florian Westphal	1	-0/+1
	Fixes: 75dd48e2e420a ("netfilter: nf_tables: Support RULE_ID reference in new rule") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller	43	-1567/+1034
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Introduce a hashtable to speed up object lookups, from Florian Westphal. 2) Make direct calls to built-in extension, also from Florian. 3) Call helper before confirming the conntrack as it used to be originally, from Florian. 4) Call request_module() to autoload br_netfilter when physdev is used to relax the dependency, also from Florian. 5) Allow to insert rules at a given position ID that is internal to the batch, from Phil Sutter. 6) Several patches to replace conntrack indirections by direct calls, and to reduce modularization, from Florian. This also includes several follow up patches to deal with minor fallout from this rework. 7) Use RCU from conntrack gre helper, from Florian. 8) GRE conntrack module becomes built-in into nf_conntrack, from Florian. 9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(), from Florian. 10) Unify sysctl handling at the core of nf_conntrack, from Florian. 11) Provide modparam to register conntrack hooks. 12) Allow to match on the interface kind string, from wenxu. 13) Remove several exported symbols, not required anymore now after a bit of de-modulatization work has been done, from Florian. 14) Remove built-in map support in the hash extension, this can be done with the existing userspace infrastructure, from laura. 15) Remove indirection to calculate checksums in IPVS, from Matteo Croce. 16) Use call wrappers for indirection in IPVS, also from Matteo. 17) Remove superfluous __percpu parameter in nft_counter, patch from Luc Van Oostenryck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28	netfilter: ipv4: remove useless export_symbol	Florian Westphal	1	-0/+19
	Only one caller; place it where needed and get rid of the EXPORT_SYMBOL. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	netfilter: conntrack: fix error path in nf_conntrack_pernet_init()	Cong Wang	1	-2/+2
	When nf_ct_netns_get() fails, it should clean up itself, its caller doesn't need to call nf_conntrack_fini_net(). nf_conntrack_init_net() is called after registering sysctl and proc, so its cleanup function should be called before unregistering sysctl and proc. Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks") Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling") Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	netfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s arg	Luc Van Oostenryck	1	-1/+1
	nft_counter_rest() has its first argument declared as struct nft_counter_percpu_priv __percpu *priv but this structure is not percpu (it only countains a member 'counter' which is, correctly, a pointer to a percpu struct nft_counter). So, remove the '__percpu' from the argument's declaration. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	ipvs: use indirect call wrappers	Matteo Croce	3	-10/+45
	Use the new indirect call wrappers in IPVS when calling the TCP or UDP protocol specific functions. This avoids an indirect calls in IPVS, and reduces the performance impact of the Spectre mitigation. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	ipvs: avoid indirect calls when calculating checksums	Matteo Croce	4	-15/+19
	The function pointer ip_vs_protocol->csum_check is only used in protocol specific code, and never in the generic one. Remove the function pointer from struct ip_vs_protocol and call the checksum functions directly. This reduces the performance impact of the Spectre mitigation, and should give a small improvement even with RETPOLINES disabled. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28	netfilter: nfnetlink_osf: add missing fmatch check	Fernando Fernandez Mancera	1	-0/+4
	When we check the tcp options of a packet and it doesn't match the current fingerprint, the tcp packet option pointer must be restored to its initial value in order to do the proper tcp options check for the next fingerprint. Here we can see an example. Assumming the following fingerprint base with two lines: S10:64:1:60:M,S,T,N,W6: Linux:3.0::Linux 3.0 S20:64:1:60:M,S,T,N,W7: Linux:4.19:arch:Linux 4.1 Where TCP options are the last field in the OS signature, all of them overlap except by the last one, ie. 'W6' versus 'W7'. In case a packet for Linux 4.19 kicks in, the osf finds no matching because the TCP options pointer is updated after checking for the TCP options in the first line. Therefore, reset pointer back to where it should be. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-24	ipvs: Fix signed integer overflow when setsockopt timeout	ZhangXiaoxu	1	-0/+12
	There is a UBSAN bug report as below: UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21 signed integer overflow: -2147483647 * 1000 cannot be represented in type 'int' Reproduce program: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #define IPPROTO_IP 0 #define IPPROTO_RAW 255 #define IP_VS_BASE_CTL (64+1024+64) #define IP_VS_SO_SET_TIMEOUT (IP_VS_BASE_CTL+10) /* The argument to IP_VS_SO_GET_TIMEOUT / struct ipvs_timeout_t { int tcp_timeout; int tcp_fin_timeout; int udp_timeout; }; int main() { int ret = -1; int sockfd = -1; struct ipvs_timeout_t to; sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); if (sockfd == -1) { printf("socket init error\n"); return -1; } to.tcp_timeout = -2147483647; to.tcp_fin_timeout = -2147483647; to.udp_timeout = -2147483647; ret = setsockopt(sockfd, IPPROTO_IP, IP_VS_SO_SET_TIMEOUT, (char )(&to), sizeof(to)); printf("setsockopt return %d\n", ret); return ret; } Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-22	netfilter: conntrack: fix bogus port values for other l4 protocols	Florian Westphal	1	-11/+35
	We must only extract l4 proto information if we can track the layer 4 protocol. Before removal of pkt_to_tuple callback, the code to extract port information was only reached for TCP/UDP/LITE/DCCP/SCTP. The other protocols were handled by the indirect call, and the 'generic' tracker took care of other protocols that have no notion of 'ports'. After removal of the callback we must be more strict here and only init port numbers for those protocols that have ports. Fixes: df5e1629087a ("netfilter: conntrack: remove pkt_to_tuple callback") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-22	netfilter: conntrack: fix IPV6=n builds	Florian Westphal	2	-0/+10
	Stephen Rothwell reports: After merging the netfilter-next tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: ERROR: "nf_conntrack_invert_icmpv6_tuple" [nf_conntrack.ko] undefined! ERROR: "nf_conntrack_icmpv6_packet" [nf_conntrack.ko] undefined! ERROR: "nf_conntrack_icmpv6_init_net" [nf_conntrack.ko] undefined! ERROR: "icmpv6_pkt_to_tuple" [nf_conntrack.ko] undefined! ERROR: "nf_ct_gre_keymap_destroy" [nf_conntrack.ko] undefined! icmpv6 related errors are due to lack of IS_ENABLED(CONFIG_IPV6) (no icmpv6 support is builtin if kernel has CONFIG_IPV6=n), the nf_ct_gre_keymap_destroy error is due to lack of PROTO_GRE check. Fixes: a47c54048162 ("netfilter: conntrack: handle builtin l4proto packet functions via direct calls") Fixes: e2e48b471634 ("netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls") Fixes: 197c4300aec0 ("netfilter: conntrack: remove invert_tuple callback") Fixes: 2a389de86e4a ("netfilter: conntrack: remove l4proto init and get_net callbacks") Fixes: e56894356f60 ("netfilter: conntrack: remove l4proto destroy hook") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	Revert "netfilter: nft_hash: add map lookups for hashing operations"	Laura Garcia Liebana	1	-121/+0
	A better way to implement this from userspace has been found without specific code in the kernel side, revert this. Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations") Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nat: un-export nf_nat_used_tuple	Florian Westphal	1	-2/+1
	Not used since 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nft_meta: Add NFT_META_I/OIFKIND meta type	wenxu	1	-0/+12
	In the ip_rcv the skb goes through the PREROUTING hook first, then kicks in vrf device and go through the same hook again. When conntrack dnat works with vrf, there will be some conflict with rules because the packet goes through the hook twice with different nf status. ip link add user1 type vrf table 1 ip link add user2 type vrf table 2 ip l set dev tun1 master user1 ip l set dev tun2 master user2 nft add table firewall nft add chain firewall zones { type filter hook prerouting priority - 300 \; } nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 } nft add chain firewall rule-1000-ingress nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept nft add rule firewall rule-1000-ingress counter drop nft add chain firewall rule-1000-egress nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop nft add rule firewall rule-1000-egress counter accept nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; } nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress } nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress } nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 } nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7 For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong chain rule-1000-egress which leads the packet drop With this patch, userspace can add the 'don't re-do entire ruleset for vrf' policy itself via: nft add rule firewall rules-all meta iifkind "vrf" counter accept Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nf_conntrack: provide modparam to always register conntrack hooks	Pablo Neira Ayuso	1	-4/+24
	The connection tracking hooks can be optionally registered per netns when conntrack is specifically invoked from the ruleset since 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed"), the default behaviour is changed to always register them on demand. This patch provides a toggle that allows users to always register them. Without this toggle, in order to use conntrack for statistics collection, you need a dummy rule that refers to conntrack, eg. iptables -I INPUT -m state --state NEW This patch allows users to restore the original behaviour via modparam, ie. always register connection tracking, eg. modprobe nf_conntrack enable_hooks=1 Hence, no dummy rule is required. Reported-by: Laura Garcia <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove nf_ct_l4proto_find_get	Florian Westphal	9	-174/+43
	Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove l4proto destroy hook	Florian Westphal	2	-18/+11
	Only one user (gre), add a direct call and remove this facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove l4proto init and get_net callbacks	Florian Westphal	9	-217/+56
	Those were needed we still had modular trackers. As we don't have those anymore, prefer direct calls and remove all the (un)register infrastructure associated with this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove sysctl registration helpers	Florian Westphal	1	-76/+1
	After previous patch these are not used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: unify sysctl handling	Florian Westphal	9	-460/+391
	Due to historical reasons, all l4 trackers register their own sysctls. This leads to copy&pasted boilerplate code, that does exactly same thing, just with different data structure. Place all of this in a single file. This allows to remove the various ctl_table pointers from the ct_netns structure and reduces overall code size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: avoid unneeded nf_conntrack_l4proto lookups	Florian Westphal	5	-61/+18
	after removal of the packet and invert function pointers, several places do not need to lookup the l4proto structure anymore. Remove those lookups. The function nf_ct_invert_tuplepr becomes redundant, replace it with nf_ct_invert_tuple everywhere. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove pernet l4 proto register interface	Florian Westphal	1	-16/+12
	No used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove remaining l4proto indirect packet calls	Florian Westphal	3	-48/+24
	Now that all l4trackers are builtin, no need to use a mix of direct and indirect calls. This removes the last two users: gre and the generic l4 protocol tracker. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove module owner field	Florian Westphal	4	-17/+0
	No need to get/put module owner reference, none of these can be removed anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove invert_tuple callback	Florian Westphal	3	-8/+10
	Only used by icmp(v6). Prefer a direct call and remove this function from the l4proto struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove pkt_to_tuple callback	Florian Westphal	3	-16/+6
	GRE is now builtin, so we can handle it via direct call and remove the callback. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove net_id	Florian Westphal	1	-6/+2
	No users anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: gre: switch module to be built-in	Florian Westphal	5	-85/+27
	This makes the last of the modular l4 trackers 'bool'. After this, all infrastructure to handle dynamic l4 protocol registration becomes obsolete and can be removed in followup patches. Old: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko New: 313728 net/netfilter/nf_conntrack.ko Old: text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko New: 112095 21381 240 133716 20a54 nf_conntrack.ko The size increase is only temporary. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: gre: convert rwlock to rcu	Florian Westphal	1	-22/+15
	We can use gre. Lock is only needed when a new expectation is added. In case a single spinlock proves to be problematic we can either add one per netns or use an array of locks combined with net_hash_mix() or similar to pick the 'correct' one. But given this is only needed for an expectation rather than per packet a single one should be ok. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls	Florian Westphal	3	-8/+12
	rather than handling them via indirect call, use a direct one instead. This leaves GRE as the last user of this indirect call facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: handle builtin l4proto packet functions via direct calls	Florian Westphal	7	-44/+76
	The l4 protocol trackers are invoked via indirect call: l4proto->packet(). With one exception (gre), all l4trackers are builtin, so we can make .packet optional and use a direct call for most protocols. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nf_tables: Support RULE_ID reference in new rule	Phil Sutter	1	-0/+9
	To allow for a batch to contain rules in arbitrary ordering, introduce NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION but contains the ID of another rule within the same batch. This helps iptables-nft-restore handling dumps with mixed insert/append commands correctly. Note that NFTA_RULE_POSITION takes precedence over NFTA_RULE_POSITION_ID, so if the former is present, the latter is ignored. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: physdev: relax br_netfilter dependency	Florian Westphal	1	-2/+7
	Following command: iptables -D FORWARD -m physdev ... causes connectivity loss in some setups. Reason is that iptables userspace will probe kernel for the module revision of the physdev patch, and physdev has an artificial dependency on br_netfilter (xt_physdev use makes no sense unless a br_netfilter module is loaded). This causes the "phydev" module to be loaded, which in turn enables the "call-iptables" infrastructure. bridged packets might then get dropped by the iptables ruleset. The better fix would be to change the "call-iptables" defaults to 0 and enforce explicit setting to 1, but that breaks backwards compatibility. This does the next best thing: add a request_module call to checkentry. This was a stray '-D ... -m physdev' won't activate br_netfilter anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: conntrack: remove helper hook again	Florian Westphal	1	-106/+36
	place them into the confirm one. Old: hook (300): ipv4/6_help() first call helper, then seqadj. hook (INT_MAX): confirm Now: hook (INT_MAX): confirm, first call helper, then seqadj, then confirm Not having the extra call is noticeable in bechmarks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nf_tables: add direct calls for all builtin expressions	Florian Westphal	9	-31/+39
	With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func) check and and use direct calls for all the built-in expressions. ~15% improvement in pathological cases. checkpatch doesn't like the X macro due to the embedded return statement, but the macro has a very limited scope so I don't think its a problem. I would like to avoid bugs of the form If (e->ops->eval == (unsigned long)nft_foo_eval) nft_bar_eval(); and open-coded if ()/else if()/else cascade, thus the macro. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nf_tables: handle nft_object lookups via rhltable	Florian Westphal	2	-13/+93
	Instead of linear search, use rhlist interface to look up the objects. This fixes rulesets with thousands of named objects (quota, counters and the like). We only use a single table for this and consider the address of the table we're doing the lookup in as a part of the key. This reduces restore time of a sample ruleset with ~20k named counters from 37 seconds to 0.8 seconds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nf_tables: prepare nft_object for lookups via hashtable	Florian Westphal	3	-13/+18
	Add a 'key' structure for object, so we can look them up by name + table combination (the name can be the same in each table). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nft_compat: destroy function must not have side effects	Florian Westphal	1	-1/+47
	The nft_compat destroy function deletes the nft_xt object from a list. This isn't allowed anymore. Destroy functions are called asynchronously, i.e. next batch can find the object that has a pending ->destroy() invocation: cpu0 cpu1 worker ->destroy for_each_entry() if (x == ... return x->ops; list_del(x) kfree_rcu(x) expr->ops->... // ops was free'd To resolve this, the list_del needs to occur before the transaction mutex gets released. nf_tables has a 'deactivate' hook for this purpose, so use that to unlink the object from the list. Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nft_compat: make lists per netns	Florian Westphal	1	-40/+89
	There are two problems with nft_compat since the netlink config plane uses a per-netns mutex: 1. Concurrent add/del accesses to the same list 2. accesses to a list element after it has been free'd already. This patch fixes the first problem. Freeing occurs from a work queue, after transaction mutexes have been released, i.e., it still possible for a new transaction (even from same net ns) to find the to-be-deleted expression in the list. The ->destroy functions are not allowed to have any such side effects, i.e. the list_del() in the destroy function is not allowed. This part of the problem is solved in the next patch. I tried to make this work by serializing list access via mutex and by moving list_del() to a deactivate callback, but Taehee spotted following race on this approach: NET #0 NET #1 >select_ops() ->init() ->select_ops() ->deactivate() ->destroy() nft_xt_put() kfree_rcu(xt, rcu_head); ->init() <-- use-after-free occurred. Unfortunately, we can't increment reference count in select_ops(), because we can't undo the refcount increase in case a different expression fails in the same batch. (The destroy hook will only be called in case the expression was initialized successfully). Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18	netfilter: nft_compat: use refcnt_t type for nft_xt reference count	Florian Westphal	1	-8/+8
	Using standard integer type was fine while all operations on it were guarded by the nftnl subsys mutex. This isn't true anymore: 1. transactions are guarded only by a pernet mutex, so concurrent rule manipulation in different netns is racy 2. the ->destroy hook runs from a work queue after the transaction mutex has been released already. cpu0 cpu1 (net 1) cpu2 (net 2) kworker nft_compat->destroy nft_compat->init nft_compat->init if (--nft_xt->ref == 0) nft_xt->ref++ nft_xt->ref++ Switch to refcount_t. Doing this however only fixes a minor aspect, nft_compat also performs linked-list operations in an unsafe way. This is addressed in the next two patches. Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller	3	-14/+18
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net This is the first batch of Netfilter fixes for your net tree: 1) Fix endless loop in nf_tables rules netlink dump, from Phil Sutter. 2) Reference counter leak in object from the error path, from Taehee Yoo. 3) Selective rule dump requires table and chain. 4) Fix DNAT with nft_flow_offload reverse route lookup, from wenxu. 5) Use GFP_KERNEL_ACCOUNT in vmalloc allocation from ebtables, from Shakeel Butt. 6) Set ifindex from route to fix interaction with VRF slave device, also from wenxu. 7) Use nfct_help() to check for conntrack helper, IPS_HELPER status flag is only set from explicit helpers via -j CT, from Henry Yen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-14	netfilter: nft_flow_offload: fix checking method of conntrack helper	Henry Yen	1	-1/+4
	This patch uses nfct_help() to detect whether an established connection needs conntrack helper instead of using test_bit(IPS_HELPER_BIT, &ct->status). The reason is that IPS_HELPER_BIT is only set when using explicit CT target. However, in the case that a device enables conntrack helper via command "echo 1 > /proc/sys/net/netfilter/nf_conntrack_helper", the status of IPS_HELPER_BIT will not present any change, and consequently it loses the checking ability in the context. Signed-off-by: Henry Yen <henry.yen@mediatek.com> Reviewed-by: Ryder Lee <ryder.lee@mediatek.com> Tested-by: John Crispin <john@phrozen.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>