aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6/netfilter (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-07-03Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+3
Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28netfilter: check if the socket netns is correct.Flavio Leitner1-4/+4
Netfilter assumes that if the socket is present in the skb, then it can be used because that reference is cleaned up while the skb is crossing netns. We want to change that to preserve the socket reference in a future patch, so this is a preparation updating netfilter to check if the socket netns matches before use it. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-18netfilter: ipv6: nf_defrag: reduce struct net memory wasteEric Dumazet1-3/+3
It is a waste of memory to use a full "struct netns_sysctl_ipv6" while only one pointer is really used, considering netns_sysctl_ipv6 keeps growing. Also, since "struct netns_frags" has cache line alignment, it is better to move the frags_hdr pointer outside, otherwise we spend a full cache line for this pointer. This saves 192 bytes of memory per netns. Fixes: c038a767cd69 ("ipv6: add a new namespace for nf_conntrack_reasm") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+1
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Reject non-null terminated helper names from xt_CT, from Gao Feng. 2) Fix KASAN splat due to out-of-bound access from commit phase, from Alexey Kodanev. 3) Missing conntrack hook registration on IPVS FTP helper, from Julian Anastasov. 4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo. 5) Fix inverted check on packet xmit to non-local addresses, also from Julian. 6) Fix ebtables alignment compat problems, from Alin Nastac. 7) Hook mask checks are not correct in xt_set, from Serhey Popovych. 8) Fix timeout listing of element in ipsets, from Jozsef. 9) Cap maximum timeout value in ipset, also from Jozsef. 10) Don't allow family option for hash:mac sets, from Florent Fourcot. 11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this Florian. 12) Another bug reported by KASAN in the rbtree set backend, from Taehee Yoo. 13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT. From Gao Feng. 14) Missing initialization of match/target in ebtables, from Florian Westphal. 15) Remove useless nft_dup.h file in include path, from C. Labbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-08netfilter: x_tables: initialise match/target check parameter structFlorian Westphal1-0/+1
syzbot reports following splat: BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506 ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline] ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline] The uninitialised access is xt_mtchk_param->nft_compat ... which should be set to 0. Fix it by zeroing the struct beforehand, same for tgchk. ip(6)tables targetinfo uses c99-style initialiser, so no change needed there. Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com Fixes: 55917a21d0cc0 ("netfilter: x_tables: add context to know if extension runs from nft_compat") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: Libify xt_TPROXYMáté Eckl3-1/+151
The extracted functions will likely be usefull to implement tproxy support in nf_tables. Extrancted functions: - nf_tproxy_sk_is_transparent - nf_tproxy_laddr4 - nf_tproxy_handle_time_wait4 - nf_tproxy_get_sock_v4 - nf_tproxy_laddr6 - nf_tproxy_handle_time_wait6 - nf_tproxy_get_sock_v6 (nf_)tproxy_handle_time_wait6 also needed some refactor as its current implementation was xtables-specific. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-29netfilter: nat: merge ipv4/ipv6 masquerade code into main nat moduleFlorian Westphal3-9/+2
Instead of using extra modules for these, turn the config options into an implicit dependency that adds masq feature to the protocol specific nf_nat module. before: text data bss dec hex filename 2001 860 4 2865 b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko 5579 780 2 6361 18d9 net/ipv4/netfilter/nf_nat_ipv4.ko 2860 836 8 3704 e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko 6648 780 2 7430 1d06 net/ipv6/netfilter/nf_nat_ipv6.ko after: text data bss dec hex filename 7245 872 8 8125 1fbd net/ipv4/netfilter/nf_nat_ipv4.ko 9165 848 12 10025 2729 net/ipv6/netfilter/nf_nat_ipv6.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller5-154/+114
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal. 2) Add support for map lookups to numgen, random and hash expressions, from Laura Garcia. 3) Allow to register nat hooks for iptables and nftables at the same time. Patchset from Florian Westpha. 4) Timeout support for rbtree sets. 5) ip6_rpfilter works needs interface for link-local addresses, from Vincent Bernat. 6) Add nf_ct_hook and nf_nat_hook structures and use them. 7) Do not drop packets on packets raceing to insert conntrack entries into hashes, this is particularly a problem in nfqueue setups. 8) Address fallout from xt_osf separation to nf_osf, patches from Florian Westphal and Fernando Mancera. 9) Remove reference to struct nft_af_info, which doesn't exist anymore. From Taehee Yoo. This batch comes with is a conflict between 25fd386e0bc0 ("netfilter: core: add missing __rcu annotation") in your tree and 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") coming in this batch. This conflict can be solved by leaving the __rcu tag on __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code related to nf_nat_decode_session_hook - which is gone after 2c205dd3981f, as described by: diff --cc net/netfilter/core.c index e0ae4aae96f5,206fb2c4c319..168af54db975 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo EXPORT_SYMBOL_GPL(nf_ct_zone_dflt); #endif /* CONFIG_NF_CONNTRACK */ - static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max) -#ifdef CONFIG_NF_NAT_NEEDED -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *); -EXPORT_SYMBOL(nf_nat_decode_session_hook); -#endif - + static void __net_init + __netfilter_net_init(struct nf_hook_entries __rcu **e, int max) { int h; I can also merge your net-next tree into nf-next, solve the conflict and resend the pull request if you prefer so. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23netfilter: ip6t_rpfilter: provide input interface for route lookupVincent Bernat1-0/+2
In commit 47b7e7f82802, this bit was removed at the same time the RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when link-local addresses are used, which is a very common case: when packets are routed, neighbor solicitations are done using link-local addresses. For example, the following neighbor solicitation is not matched by "-m rpfilter": IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32 Commit 47b7e7f82802 doesn't quite explain why we shouldn't use RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check later in the function would make it redundant. However, the remaining of the routing code is using RT6_LOOKUP_F_IFACE when there is no source address (which matches rpfilter's case with a non-unicast destination, like with neighbor solicitation). Signed-off-by: Vincent Bernat <vincent@bernat.im> Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_nat: add nat type hooks to nat coreFlorian Westphal3-112/+103
Currently the packet rewrite and instantiation of nat NULL bindings happens from the protocol specific nat backend. Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type. Invocation looks like this (simplified): NF_HOOK() | `---iptable_nat | `---> nf_nat_l3proto_ipv4 -> nf_nat_packet | new packet? pass skb though iptables nat chain | `---> iptable_nat: ipt_do_table In nft case, this looks the same (nft_chain_nat_ipv4 instead of iptable_nat). This is a problem for two reasons: 1. Can't use iptables nat and nf_tables nat at the same time, as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a NULL binding if do_table() did not find a matching nat rule so we can detect post-nat tuple collisions). 2. If you use e.g. nft_masq, snat, redir, etc. uses must also register an empty base chain so that the nat core gets called fro NF_HOOK() to do the reverse translation, which is neither obvious nor user friendly. After this change, the base hook gets registered not from iptable_nat or nftables nat hooks, but from the l3 nat core. iptables/nft nat base hooks get registered with the nat core instead: NF_HOOK() | `---> nf_nat_l3proto_ipv4 -> nf_nat_packet | new packet? pass skb through iptables/nftables nat chains | +-> iptables_nat: ipt_do_table +-> nft nat chain x `-> nft nat chain y The nat core deals with null bindings and reverse translation. When no mapping exists, it calls the registered nat lookup hooks until one creates a new mapping. If both iptables and nftables nat hooks exist, the first matching one is used (i.e., higher priority wins). Also, nft users do not need to create empty nat hooks anymore, nat core always registers the base hooks that take care of reverse/reply translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_tables: allow chain type to override hook registerFlorian Westphal1-6/+14
Will be used in followup patch when nat types no longer use nf_register_net_hook() but will instead register with the nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: xtables: allow table definitions not backed by hook_opsFlorian Westphal1-1/+4
The ip(6)tables nat table is currently receiving skbs from the netfilter core, after a followup patch skbs will be coming from the netfilter nat core instead, so the table is no longer backed by normal hook_ops. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_nat: move common nat code to nat coreFlorian Westphal1-46/+2
Copy-pasted, both l3 helpers almost use same code here. Split out the common part into an 'inet' helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08netfilter: x_tables: add module alias for icmp matchesFlorian Westphal1-0/+1
The icmp matches are implemented in ip_tables and ip6_tables, respectively, so for normal iptables they are always available: those modules are loaded once iptables calls getsockopt() to fetch available module revisions. In iptables-over-nftables case probing occurs via nfnetlink, so these modules might not be loaded. Add aliases so modprobe can load these when icmp/icmp6 is requested. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller11-276/+180
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, more relevant updates in this batch are: 1) Add Maglev support to IPVS. Moreover, store lastest server weight in IPVS since this is needed by maglev, patches from from Inju Song. 2) Preparation works to add iptables flowtable support, patches from Felix Fietkau. 3) Hand over flows back to conntrack slow path in case of TCP RST/FIN packet is seen via new teardown state, also from Felix. 4) Add support for extended netlink error reporting for nf_tables. 5) Support for larger timeouts that 23 days in nf_tables, patch from Florian Westphal. 6) Always set an upper limit to dynamic sets, also from Florian. 7) Allow number generator to make map lookups, from Laura Garcia. 8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat. 9) Extend ip6tables SRH match to support previous, next and last SID, from Ahmed Abdelsalam. 10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez. 11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot. 12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables, from Taehee Yoo. 13) Unify meta bridge with core nft_meta, then make nft_meta built-in. Make rt and exthdr built-in too, again from Florian. 14) Missing initialization of tbl->entries in IPVS, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-06netfilter: nf_nat: remove unused ct arg from lookup functionsFlorian Westphal3-13/+7
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06netfilter: ip6t_srh: extend SRH matching for previous, next and last SIDAhmed Abdelsalam1-9/+164
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed by SR encapsulated packet. Each SID is encoded as an IPv6 prefix. When a Firewall receives an SR encapsulated packet, it should be able to identify which node previously processed the packet (previous SID), which node is going to process the packet next (next SID), and which node is the last to process the packet (last SID) which represent the final destination of the packet in case of inline SR mode. An example use-case of using these features could be SID list that includes two firewalls. When the second firewall receives a packet, it can check whether the packet has been processed by the first firewall or not. Based on that check, it decides to apply all rules, apply just subset of the rules, or totally skip all rules and forward the packet to the next SID. This patch extends SRH match to support matching previous SID, next SID, and last SID. Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-24netfilter: x_tables: remove duplicate ip6t_get_target function callTaehee Yoo1-1/+0
In the check_target, ip6t_get_target is called twice. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-24netfilter: add NAT support for shifted portmap rangesThierry Du Tre6-8/+8
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100) Currently DNAT only works for single port or identical port ranges. (i.e. ports 5000-5100 on WAN interface redirected to a LAN host while original destination port is not altered) When different port ranges are configured, either 'random' mode should be used, or else all incoming connections are mapped onto the first port in the redirect range. (in described example WAN:5000-5100 will all be mapped to 192.168.1.5:2000) This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET which uses a base port value to calculate an offset with the destination port present in the incoming stream. That offset is then applied as index within the redirect port range (index modulo rangewidth to handle range overflow). In described example the base port would be 5000. An incoming stream with destination port 5004 would result in an offset value 4 which means that the NAT'ed stream will be using destination port 2004. Other possibilities include deterministic mapping of larger or multiple ranges to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port 51xx) This patch does not change any current behavior. It just adds new NAT proto range functionality which must be selected via the specific flag when intended to use. A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed which makes this functionality immediately available. Signed-off-by: Thierry Du Tre <thierry@dtsystems.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-24netfilter: nf_flow_table: move init code to nf_flow_table_core.cFelix Fietkau1-2/+1
Reduces duplication of .gc and .params in flowtable type definitions and makes the API clearer Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-24netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_tableFelix Fietkau1-232/+0
Useful as preparation for adding iptables support for offload. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-21netfilter: nf_flow_table: cache mtu in struct flow_offload_tupleFelix Fietkau1-14/+3
Reduces the number of cache lines touched in the offload forwarding path. This is safe because PMTU limits are bypassed for the forwarding path (see commit f87c10a8aa1e for more details). Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-19netfilter: nf_tables: NAT chain and extensions require NF_TABLESPablo Neira Ayuso1-27/+28
Move these options inside the scope of the 'if' NF_TABLES and NF_TABLES_IPV6 dependencies. This patch fixes: net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_nat_do_chain': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:37: undefined reference to `nft_do_chain' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_exit': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:94: undefined reference to `nft_unregister_chain_type' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_init': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:87: undefined reference to `nft_register_chain_type' that happens with: CONFIG_NF_TABLES=m CONFIG_NFT_CHAIN_NAT_IPV6=y Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-04inet: frags: fix ip6frag_low_thresh boundaryEric Dumazet1-2/+0
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches, since linker might place next to it a non zero value preventing a change to ip6frag_low_thresh. ip6frag_low_thresh is not used anymore in the kernel, but we do not want to prematuraly break user scripts wanting to change it. Since specifying a minimal value of 0 for proc_doulongvec_minmax() is moot, let's remove these zero values in all defrag units. Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+4
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: get rid of nf_ct_frag6_skb_cb/NFCT_FRAG6_CBEric Dumazet1-18/+11
nf_ct_frag6_queue() uses skb->cb[] to store the fragment offset, meaning that we could use two cache lines per skb when finding the insertion point, if for some reason inet6_skb_parm size is increased in the future. By using skb->ip_defrag_offset instead of skb->cb[] we pack all the fields in a single cache line, matching what we did for IPv4. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: break the 2GB limit for frags storageEric Dumazet1-5/+5
Some users are willing to provision huge amounts of memory to be able to perform reassembly reasonnably well under pressure. Current memory tracking is using one atomic_t and integers. Switch to atomic_long_t so that 64bit arches can use more than 2GB, without any cost for 32bit arches. Note that this patch avoids an overflow error, if high_thresh was set to ~2GB, since this test in inet_frag_alloc() was never true : if (... || frag_mem_limit(nf) > nf->high_thresh) Tested: $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh <frag DDOS> $ grep FRAG /proc/net/sockstat FRAG: inuse 14705885 memory 16000002880 $ nstat -n ; sleep 1 ; nstat | grep Reas IpReasmReqds 3317150 0.0 IpReasmFails 3317112 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: remove inet_frag_maybe_warn_overflow()Eric Dumazet1-3/+2
This function is obsolete, after rhashtable addition to inet defrag. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: use rhashtables for reassembly unitsEric Dumazet1-38/+13
Some applications still rely on IP fragmentation, and to be fair linux reassembly unit is not working under any serious load. It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!) A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU. A followup patch will even remove the refcount hold/release left from prior implementation and save a couple of atomic operations. Before this patch, 16 cpus (16 RX queue NIC) could not handle more than 1 Mpps frags DDOS. After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB of storage for the fragments (exact number depends on frags being evicted after timeout) $ grep FRAG /proc/net/sockstat FRAG: inuse 1966916 memory 2140004608 A followup patch will change the limits for 64bit arches. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Florian Westphal <fw@strlen.de> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: add a pointer to struct netns_fragsEric Dumazet1-7/+9
In order to simplify the API, add a pointer to struct inet_frags. This will allow us to make things less complex. These functions no longer have a struct inet_frags parameter : inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */) ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: change inet_frags_init_net() return valueEric Dumazet1-3/+9
We will soon initialize one rhashtable per struct netns_frags in inet_frags_init_net(). This patch changes the return value to eventually propagate an error. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller7-96/+39
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. This batch comes with more input sanitization for xtables to address bug reports from fuzzers, preparation works to the flowtable infrastructure and assorted updates. In no particular order, they are: 1) Make sure userspace provides a valid standard target verdict, from Florian Westphal. 2) Sanitize error target size, also from Florian. 3) Validate that last rule in basechain matches underflow/policy since userspace assumes this when decoding the ruleset blob that comes from the kernel, from Florian. 4) Consolidate hook entry checks through xt_check_table_hooks(), patch from Florian. 5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject very large compat offset arrays, so we have a reasonable upper limit and fuzzers don't exercise the oom-killer. Patches from Florian. 6) Several WARN_ON checks on xtables mutex helper, from Florian. 7) xt_rateest now has a hashtable per net, from Cong Wang. 8) Consolidate counter allocation in xt_counters_alloc(), from Florian. 9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch from Xin Long. 10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from Felix Fietkau. 11) Consolidate code through flow_offload_fill_dir(), also from Felix. 12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward() to remove a dependency with flowtable and ipv6.ko, from Felix. 13) Cache mtu size in flow_offload_tuple object, this is safe for forwarding as f87c10a8aa1e describes, from Felix. 14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too modular infrastructure, from Felix. 15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from Ahmed Abdelsalam. 16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei. 17) Support for counting only to nf_conncount infrastructure, patch from Yi-Hung Wei. 18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes to nft_ct. 19) Use boolean as return value from ipt_ah and from IPVS too, patch from Gustavo A. R. Silva. 20) Remove useless parameters in nfnl_acct_overquota() and nf_conntrack_broadcast_help(), from Taehee Yoo. 21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo. 22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu. 23) Fix typo in xt_limit, from Geert Uytterhoeven. 24) Do no use VLAs in Netfilter code, again from Gustavo. 25) Use ADD_COUNTER from ebtables, from Taehee Yoo. 26) Bitshift support for CONNMARK and MARK targets, from Jack Ma. 27) Use pr_*() and add pr_fmt(), from Arushi Singhal. 28) Add synproxy support to ctnetlink. 29) ICMP type and IGMP matching support for ebtables, patches from Matthias Schiffer. 30) Support for the revision infrastructure to ebtables, from Bernie Harris. 31) String match support for ebtables, also from Bernie. 32) Documentation for the new flowtable infrastructure. 33) Use generic comparison functions in ebt_stp, from Joe Perches. 34) Demodularize filter chains in nftables. 35) Register conntrack hooks in case nftables NAT chain is added. 36) Merge assignments with return in a couple of spots in the Netfilter codebase, also from Arushi. 37) Document that xtables percpu counters are stored in the same memory area, from Ben Hutchings. 38) Revert mark_source_chains() sanity checks that break existing rulesets, from Florian Westphal. 39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30Revert "netfilter: x_tables: ensure last rule in base chain matches underflow/policy"Florian Westphal1-16/+1
This reverts commit 0d7df906a0e78079a02108b06d32c3ef2238ad25. Valdis Kletnieks reported that xtables is broken in linux-next since 0d7df906a0e78 ("netfilter: x_tables: ensure last rule in base chain matches underflow/policy"), as kernel rejects the (well-formed) ruleset: [ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1) mark_source_chains is not the correct place for such a check, as it terminates evaluation of a chain once it sees an unconditional verdict (following rules are known to be unreachable). It seems preferrable to fix libiptc instead, so remove this check again. Fixes: 0d7df906a0e78 ("netfilter: x_tables: ensure last rule in base chain matches underflow/policy") Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: enable conntrack if NAT chain is registeredPablo Neira Ayuso1-0/+12
Register conntrack hooks if the user adds NAT chains. Users get confused with the existing behaviour since they will see no packets hitting this chain until they add the first rule that refers to conntrack. This patch adds new ->init() and ->free() indirections to chain types that can be used by NAT chains to invoke the conntrack dependency. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: build-in filter chain typePablo Neira Ayuso3-69/+1
One module per supported filter chain family type takes too much memory for very little code - too much modularization - place all chain filter definitions in one single file. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: nft_register_chain_type() returns voidPablo Neira Ayuso3-7/+7
Use WARN_ON() instead since it should not happen that neither family goes over NFPROTO_NUMPROTO nor there is already a chain of this type already registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: rename struct nf_chain_typePablo Neira Ayuso3-3/+3
Use nft_ prefix. By when I added chain types, I forgot to use the nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too, otherwise there is an overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai10-10/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}Subash Abhinov Kasiviswanathan1-2/+4
skb_header_pointer will copy data into a buffer if data is non linear, otherwise it will return a pointer in the linear section of the data. nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later accesses memory within the size of tcphdr (th->doff) in case of TCP packets. This causes a crash when running with KASAN with the following call stack - BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971 CPU: 2 PID: 28971 Comm: syz-executor Tainted: G B W O 4.9.65+ #1 Call trace: [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76 [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226 [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51 [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248 [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline] [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371 [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372 [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline] [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739 [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline] [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Fix this by copying data into appropriate size headers based on protocol. Fixes: a583636a83ea ("inet: refactor inet[6]_lookup functions to take skb") Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-22net: Convert nf_ct_net_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after a560002437d3 "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20netfilter: ctnetlink: synproxy supportPablo Neira Ayuso1-1/+7
This patch exposes synproxy information per-conntrack. Moreover, send sequence adjustment events once server sends us the SYN,ACK packet, so we can synchronize the sequence adjustment too for packets going as reply from the server, as part of the synproxy logic. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-08net: Convet ipv6_net_opsKirill Tkhai1-0/+1
These pernet_operations are similar to ipv4_net_ops. They are safe to be async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert ip6 tables pernet_operationsKirill Tkhai5-0/+5
The pernet_operations: ip6table_filter_net_ops ip6table_mangle_net_ops ip6table_nat_net_ops ip6table_raw_net_ops ip6table_security_net_ops have exit methods, which call ip6t_unregister_table(). ip6table_filter_net_ops has init method registering filter table. Since there must not be in-flight ipv6 packets at the time of pernet_operations execution and since pernet_operations don't send ipv6 packets each other, these pernet_operations are safe to be async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-15/+7
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05netfilter: x_tables: ensure last rule in base chain matches underflow/policyFlorian Westphal1-1/+16
Harmless from kernel point of view, but again iptables assumes that this is true when decoding ruleset coming from kernel. If a (syzkaller generated) ruleset doesn't have the underflow/policy stored as the last rule in the base chain, then iptables will abort() because it doesn't find the chain policy. libiptc assumes that the policy is the last rule in the basechain, which is only true for iptables-generated rulesets. Unfortunately this needs code duplication -- the functions need the struct layout of the rule head, but that is different for ip/ip6/arptables. NB: pr_warn could be pr_debug but in case this break rulesets somehow its useful to know why blob was rejected. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-05netfilter: compat: prepare xt_compat_init_offsets to return errorsFlorian Westphal1-3/+7
should have no impact, function still always returns 0. This patch is only to ease review. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-05netfilter: x_tables: add counters allocation wrapperFlorian Westphal1-1/+1
allows to have size checks in a single spot. This is supposed to reduce oom situations when fuzz-testing xtables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-05netfilter: x_tables: move hook entry checks into coreFlorian Westphal1-10/+3
Allow followup patch to change on location instead of three. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-05netfilter: x_tables: check standard verdicts in coreFlorian Westphal1-5/+0
Userspace must provide a valid verdict to the standard target. The verdict can be either a jump (signed int > 0), or a return code. Allowed return codes are either RETURN (pop from stack), NF_ACCEPT, DROP and QUEUE (latter is allowed for legacy reasons). Jump offsets (verdict > 0) are checked in more detail later on when loop-detection is performed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>