aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/nft_hash.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-09-03netfilter: nft_hash: no need for rcu in the hash set destroy pathPablo Neira Ayuso1-5/+7
The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the hashes (thus, no resizing may occur) and no packets are walking on the set anymore. This resolves a lockdep splat in the nft_hash_destroy() path since the nfnl mutex is not held there. =============================== [ INFO: suspicious RCU usage. ] 3.16.0-rc2+ #168 Not tainted ------------------------------- net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by ksoftirqd/0/3: #0: (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7 stack backtrace: CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168 Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006 ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400 ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08 Call Trace: [<ffffffff8142c922>] dump_stack+0x4e/0x68 [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103 [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash] [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Thomas Graf <tgraf@suug.ch>
2014-08-02nftables: Convert nft_hash to use generic rhashtableThomas Graf1-236/+55
The sizing of the hash table and the practice of requiring a lookup to retrieve the pprev to be stored in the element cookie before the deletion of an entry is left intact. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Patrick McHardy <kaber@trash.net> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: use the new API kvfree()WANG Cong1-4/+1
It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-02netfilter: nft_hash: use set global element counter instead of private onePatrick McHardy1-7/+2
Now that nf_tables performs global accounting of set elements, it is not needed in the hash type anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-02netfilter: nf_tables: implement proper set selectionPatrick McHardy1-2/+43
The current set selection simply choses the first set type that provides the requested features, which always results in the rbtree being chosen by virtue of being the first set in the list. What we actually want to do is choose the implementation that can provide the requested features and is optimal from either a performance or memory perspective depending on the characteristics of the elements and the preferences specified by the user. The elements are not known when creating a set. Even if we would provide them for anonymous (literal) sets, we'd still have standalone sets where the elements are not known in advance. We therefore need an abstract description of the data charcteristics. The kernel already knows the size of the key, this patch starts by introducing a nested set description which so far contains only the maximum amount of elements. Based on this the set implementations are changed to provide an estimate of the required amount of memory and the lookup complexity class. The set ops have a new callback ->estimate() that is invoked during set selection. It receives a structure containing the attributes known to the kernel and is supposed to populate a struct nft_set_estimate with the complexity class and, in case the size is known, the complete amount of memory required, or the amount of memory required per element otherwise. Based on the policy specified by the user (performance/memory, defaulting to performance) the kernel will then select the best suited implementation. Even if the set implementation would allow to add more than the specified maximum amount of elements, they are enforced since new implementations might not be able to add more than maximum based on which they were selected. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-18netfilter: Add missing vmalloc.h include to nft_hash.cDavid S. Miller1-0/+1
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07netfilter: nft_hash: bug fixes and resizingPatrick McHardy1-46/+214
The hash set type is very broken and was never meant to be merged in this state. Missing RCU synchronization on element removal, leaking chain refcounts when used as a verdict map, races during lookups, a fixed table size are probably just some of the problems. Luckily it is currently never chosen by the kernel when the rbtree type is also available. Rewrite it to be usable. The new implementation supports automatic hash table resizing using RCU, based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing For RCU-Protected Hash Tables" described in [1]. Resizing doesn't require a second list head in the elements, it works by chosing a hash function that remaps elements to a predictable set of buckets, only resizing by integral factors and - during expansion: linking new buckets to the old bucket that contains elements for any of the new buckets, thereby creating imprecise chains, then incrementally seperating the elements until the new buckets only contain elements that hash directly to them. - during shrinking: linking the hash chains of all old buckets that hash to the same new bucket to form a single chain. Expansion requires at most the number of elements in the longest hash chain grace periods, shrinking requires a single grace period. Due to the requirement of having hash chains/elements linked to multiple buckets during resizing, homemade single linked lists are used instead of the existing list helpers, that don't support this in a clean fashion. As a side effect, the amount of memory required per element is reduced by one pointer. Expansion is triggered when the load factors exceeds 75%, shrinking when the load factor goes below 30%. Both operations are allowed to fail and will be retried on the next insertion or removal if their respective conditions still hold. [1] http://dl.acm.org/citation.cfm?id=2002181.2002192 Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add netlink set APIPatrick McHardy1-223/+106
This patch adds the new netlink API for maintaining nf_tables sets independently of the ruleset. The API supports the following operations: - creation of sets - deletion of sets - querying of specific sets - dumping of all sets - addition of set elements - removal of set elements - dumping of all set elements Sets are identified by name, each table defines an individual namespace. The name of a set may be allocated automatically, this is mostly useful in combination with the NFT_SET_ANONYMOUS flag, which destroys a set automatically once the last reference has been released. Sets can be marked constant, meaning they're not allowed to change while linked to a rule. This allows to perform lockless operation for set types that would otherwise require locking. Additionally, if the implementation supports it, sets can (as before) be used as maps, associating a data value with each key (or range), by specifying the NFT_SET_MAP flag and can be used for interval queries by specifying the NFT_SET_INTERVAL flag. Set elements are added and removed incrementally. All element operations support batching, reducing netlink message and set lookup overhead. The old "set" and "hash" expressions are replaced by a generic "lookup" expression, which binds to the specified set. Userspace is not aware of the actual set implementation used by the kernel anymore, all configuration options are generic. Currently the implementation selection logic is largely missing and the kernel will simply use the first registered implementation supporting the requested operation. Eventually, the plan is to have userspace supply a description of the data characteristics and select the implementation based on expected performance and memory use. This patch includes the new 'lookup' expression to look up for element matching in the set. This patch includes kernel-doc descriptions for this set API and it also includes the following fixes. From Patrick McHardy: * netfilter: nf_tables: fix set element data type in dumps * netfilter: nf_tables: fix indentation of struct nft_set_elem comments * netfilter: nf_tables: fix oops in nft_validate_data_load() * netfilter: nf_tables: fix oops while listing sets of built-in tables * netfilter: nf_tables: destroy anonymous sets immediately if binding fails * netfilter: nf_tables: propagate context to set iter callback * netfilter: nf_tables: add loop detection From Pablo Neira Ayuso: * netfilter: nf_tables: allow to dump all existing sets * netfilter: nf_tables: fix wrong type for flags variable in newelem Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: add nftablesPatrick McHardy1-0/+348
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>