Age | Commit message (Collapse) | Author | Files | Lines |
|
There is a reference counter to ensure that masquerade modules register
notifiers only once. However, the existing reference counter approach is
not safe, test commands are:
while :
do
modprobe ip6t_MASQUERADE &
modprobe nft_masq_ipv6 &
modprobe -rv ip6t_MASQUERADE &
modprobe -rv nft_masq_ipv6 &
done
numbers below represent the reference counter.
--------------------------------------------------------
CPU0 CPU1 CPU2 CPU3 CPU4
[insmod] [insmod] [rmmod] [rmmod] [insmod]
--------------------------------------------------------
0->1
register 1->2
returns 2->1
returns 1->0
0->1
register <--
unregister
--------------------------------------------------------
The unregistation of CPU3 should be processed before the
registration of CPU4.
In order to fix this, use a mutex instead of reference counter.
splat looks like:
[ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
[ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
[ 323.869574] irq event stamp: 194074
[ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
[ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
[ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
[ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
[ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
[ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
[ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
[ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
[ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
[ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
[ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
[ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
[ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
[ 323.898930] Call Trace:
[ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0
[ 323.898930] ? down_read+0x150/0x150
[ 323.898930] ? sched_clock_cpu+0x126/0x170
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] register_netdevice_notifier+0xbb/0x790
[ 323.898930] ? __dev_close_many+0x2d0/0x2d0
[ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740
[ 323.898930] ? wait_for_completion+0x710/0x710
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] ? up_write+0x6c/0x210
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables]
[ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables]
[ ... ]
Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We configured iptables as below, which only allowed incoming data on
established connections:
iptables -t mangle -A PREROUTING -m state --state ESTABLISHED -j ACCEPT
iptables -t mangle -P PREROUTING DROP
When deleting a secondary address, current masquerade implements would
flush all conntracks on this device. All the established connections on
primary address also be deleted, then subsequent incoming data on the
connections would be dropped wrongly because it was identified as NEW
connection.
So when an address was delete, it should only flush connections related
with the address.
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Instead of using extra modules for these, turn the config options into
an implicit dependency that adds masq feature to the protocol specific nf_nat module.
before:
text data bss dec hex filename
2001 860 4 2865 b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko
5579 780 2 6361 18d9 net/ipv4/netfilter/nf_nat_ipv4.ko
2860 836 8 3704 e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko
6648 780 2 7430 1d06 net/ipv6/netfilter/nf_nat_ipv6.ko
after:
text data bss dec hex filename
7245 872 8 8125 1fbd net/ipv4/netfilter/nf_nat_ipv4.ko
9165 848 12 10025 2729 net/ipv6/netfilter/nf_nat_ipv6.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).
In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.
Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)
This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.
A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
|
|
There are several places where we needlesly call nf_ct_iterate_cleanup,
we should instead iterate the full table at module unload time.
This is a leftover from back when the conntrack table got duplicated
per net namespace.
So rename nf_ct_iterate_cleanup to nf_ct_iterate_cleanup_net.
A later patch will then add a non-net variant.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently the nat extension is always attached as soon as nat module is
loaded. However, most NAT uses do not need the nat extension anymore.
Prepare to remove the add-nat-by-default by making those places that need
it attach it if its not present yet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When an inetdev is destroyed, every address assigned to the interface
is removed. And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:
1) Address promotion. We are deleting all addresses, so there is no
point in doing this.
2) A full nf conntrack table purge for every address. We only need to
do this once, as is already caught by the existing
masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
|
|
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).
The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.
This factorization only involves IPv4; a similar patch follows to
handle IPv6.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|