Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix wraparound bug which could lead to memory exhaustion when adding an
x.x.x.x-255.255.255.255 range to any hash:*net* types.
Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark.
Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Wrong comparison prevented the hash types to add a range with more than
2^31 addresses but reported as a success.
Fixes Netfilter's bugzilla id #1005, reported by Oleg Serditov and
Oliver Ford.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Before this patch struct htype created at the first source
of ip_set_hash_gen.h and it is common for both IPv4 and IPv6
set variants.
Make struct htype per ipset family and use NLEN to make
nets array fixed size to simplify struct htype allocation.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.
This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Jozsef says:
The correct behaviour is that if we have
ipset create test1 hash:net,iface
ipset add test1 0.0.0.0/0,eth0
iptables -A INPUT -m set --match-set test1 src,src
then the rule should match for any traffic coming in through eth0.
This removes the -EINVAL runtime test to make matching work
in case packet arrived via the specified interface.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1297092
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Three types of data need to be protected in the case of the hash types:
a. The hash buckets: standard rcu pointer operations are used.
b. The element blobs in the hash buckets are stored in an array and
a bitmap is used for book-keeping to tell which elements in the array
are used or free.
c. Networks per cidr values and the cidr values themselves are stored
in fix sized arrays and need no protection. The values are modified
in such an order that in the worst case an element testing is repeated
once with the same cidr value.
The ipset hash approach uses arrays instead of lists and therefore is
incompatible with rhashtable.
Performance is tested by Jesper Dangaard Brouer:
Simple drop in FORWARD
~~~~~~~~~~~~~~~~~~~~~~
Dropping via simple iptables net-mask match::
iptables -t raw -N simple || iptables -t raw -F simple
iptables -t raw -I simple -s 198.18.0.0/15 -j DROP
iptables -t raw -D PREROUTING -j simple
iptables -t raw -I PREROUTING -j simple
Drop performance in "raw": 11.3Mpps
Generator: sending 12.2Mpps (tx:12264083 pps)
Drop via original ipset in RAW table
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a set with lots of elements::
sudo ./ipset destroy test
echo "create test hash:ip hashsize 65536" > test.set
for x in `seq 0 255`; do
for y in `seq 0 255`; do
echo "add test 198.18.$x.$y" >> test.set
done
done
sudo ./ipset restore < test.set
Dropping via ipset::
iptables -t raw -F
iptables -t raw -N net198 || iptables -t raw -F net198
iptables -t raw -I net198 -m set --match-set test src -j DROP
iptables -t raw -I PREROUTING -j net198
Drop performance in "raw" with ipset: 8Mpps
Perf report numbers ipset drop in "raw"::
+ 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test
- 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh
- _raw_read_lock_bh
+ 99.88% ip_set_test
- 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh
- _raw_read_unlock_bh
+ 99.72% ip_set_test
+ 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt
+ 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer
+ 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table
+ 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test
+ 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core
+ 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb
+ 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv
+ 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive
+ 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock
+ 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq
+ 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag
+ 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc
+ 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3
+ 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive
+ 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate
+ 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page
+ 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock
Drop via ipset in RAW table with RCU-locking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With RCU locking, the RW-lock is gone.
Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps
Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Remove rbtree in order to introduce RCU instead of rwlock in ipset
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.
Testsuite entry is added to verify the fix.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
There is no reason to check CIDR value regardless attribute
specifying CIDR is given.
Initialize cidr array in element structure on element structure
declaration to let more freedom to the compiler to optimize
initialization right before element structure is used.
Remove local variables cidr and cidr2 for netnet and netportnet
hashes as we do not use packed cidr value for such set types and
can store value directly in e.cidr[].
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Even if we return with generic IPSET_ERR_PROTOCOL it is good idea
to return line number if we called in batch mode.
Moreover we are not always exiting with IPSET_ERR_PROTOCOL. For
example hash:ip,port,net may return IPSET_ERR_HASH_RANGE_UNSUPPORTED
or IPSET_ERR_INVALID_CIDR.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Make all extensions attributes checks within ip_set_get_extensions()
and reduce number of duplicated code.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
HKEY_DATALEN remains defined after first inclusion
of ip_set_hash_gen.h, so it is incorrectly reused
for IPv6 code.
Undefine HKEY_DATALEN in ip_set_hash_gen.h at the end.
Also remove some useless defines of HKEY_DATALEN in
ip_set_hash_{ip{,mark,port},netiface}.c as ip_set_hash_gen.h
defines it correctly for such set types anyway.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Ensure userspace supplies string not longer than
IPSET_MAX_COMMENT_SIZE.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Statement ret = func1() || func2() returns 0 when both func1()
and func2() return 0, or 1 if func1() or func2() returns non-zero.
However in our case func1() and func2() returns error code on
failure, so it seems good to propagate such error codes, rather
than returning 1 in case of failure.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
* Undefine mtype_data_reset_elem before defining.
* Remove duplicated mtype_gc_init undefine, move
mtype_gc_init define closer to mtype_gc define.
* Use htype instead of HTYPE in IPSET_TOKEN(HTYPE, _create)().
* Remove PF definition from sets: no more used.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge. This patch prepares removal of this pointer from skb:
Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).
Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.
Use IS_ENABLED instead of ifdef to cover the module case.
Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add skbinfo extension kernel support for the hash set types.
Inroduce the new revisions of all hash set types.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Adds a new property for hash set types, where if a set is created
with the 'forceadd' option and the set becomes full the next addition
to the set may succeed and evict a random entry from the set.
To keep overhead low eviction is done very simply. It checks to see
which bucket the new entry would be added. If the bucket's pos value
is non-zero (meaning there's at least one entry in the bucket) it
replaces the first entry in the bucket. If pos is zero, then it continues
down the normal add process.
This property is useful if you have a set for 'ban' lists where it may
not matter if you release some entries from the set early.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This provides kernel support for creating ipsets with comment support.
This does incur a penalty to flushing/destroying an ipset since all
entries are walked in order to free the allocated strings, this penalty
is of course less expensive than the operation of listing an ipset to
userspace, so for general-purpose usage the overall impact is expected
to be little to none.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Default timeout and extension offsets are moved to struct set, because
all set types supports all extensions and it makes possible to generalize
extension support.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
In order to support hash:net,net, hash:net,port,net etc. types,
arrays are introduced for the book-keeping of existing cidr sizes
and network numbers in a set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:
# ipset n test hash:net
# ipset a test 10.0.0.0/24 nomatch
# ipset t test 10.0.0.1
10.0.0.1 is NOT in set test.
# ipset t test 10.0.0.1 nomatch
10.0.0.1 is in set test.
# ipset a test 192.168.0.0/24
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is NOT in set test.
Before the patch the results were
...
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is in set test.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If a resize is triggered the nomatch flag is not excluded at hashing,
which leads to the element missed at lookup in the resized set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
attribute is copied to IFNAMSIZ-size stack variable,
but IFNAMSIZ is smaller than IPSET_MAXNAMELEN.
Fortunately nfnetlink needs CAP_NET_ADMIN.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Exceptions can now be matched and we can branch according to the
possible cases:
a. match in the set if the element is not flagged as "nomatch"
b. match in the set if the element is flagged with "nomatch"
c. no match
i.e.
iptables ... -m set --match-set ... -j ...
iptables ... -m set --match-set ... --nomatch-entries -j ...
...
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Now it is possible to setup a single hash:net,iface type of set and
a single ip6?tables match which covers all egress/ingress filtering.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
ifname_compare() assumes that skb->dev is zero-padded,
e.g 'eth1\0\0\0\0\0...'. This isn't always the case. e1000 driver does
strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
in e1000_probe(), so once device is registered dev->name memory contains
'eth1\0:0:3\0\0\0' (or something like that), which makes eth1 compare
fail.
Use plain strcmp() instead.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
The hash size must fit both into u32 (jhash) and the max value of
size_t. The missing checking could lead to kernel crash, bug reported
by Seblu.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "nomatch" keyword and option is added to the hash:*net* types,
by which one can add exception entries to sets. Example:
ipset create test hash:net
ipset add test 192.168.0/24
ipset add test 192.168.0/30 nomatch
In this case the IP addresses from 192.168.0/24 except 192.168.0/30
match the elements of the set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ipset is actually using NFPROTO values rather than AF (xt_set passes
that along).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If overlapping networks with different interfaces was added to
the set, the type did not handle it properly. Example
ipset create test hash:net,iface
ipset add test 192.168.0.0/16,eth0
ipset add test 192.168.0.0/24,eth1
Now, if a packet was sent from 192.168.0.0/24,eth0, the type returned
a match.
In the patch the algorithm is fixed in order to correctly handle
overlapping networks.
Limitation: the same network cannot be stored with more than 64 different
interfaces in a single set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
The hash:net,iface type makes possible to store network address and
interface name pairs in a set. It's mostly suitable for egress
and ingress filtering. Examples:
# ipset create test hash:net,iface
# ipset add test 192.168.0.0/16,eth0
# ipset add test 192.168.0.0/24,eth1
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|