Age | Commit message (Collapse) | Author | Files | Lines |
|
We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is
disabled:
net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct'
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct'
Both functions already have an #ifdef for this, so let's move the
check in there.
Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
conntrack defrag is needed only if some module like CONNTRACK or NAT
explicitly requests it. For plain forwarding scenarios, defrag is
not needed and can be skipped if NOTRACK is set in a rule.
Since conntrack defrag is currently higher priority than raw table,
setting NOTRACK is not sufficient. We need to move raw to a higher
priority for iptables only.
This is achieved by introducing a module parameter "raw_before_defrag"
which allows to change the priority of raw table to place it before
defrag. By default, the parameter is disabled and the priority of raw
table is NF_IP_PRI_RAW to support legacy behavior. If the module
parameter is enabled, then the priority of the raw table is set to
NF_IP_PRI_RAW_BEFORE_DEFRAG.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We no longer place these on a list so they can be const.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_defrag modules for ipv4 and ipv6 export an empty stub function.
Any module that needs the defragmentation hooks registered simply 'calls'
this empty function to create a phony module dependency -- modprobe will
then load the defrag module too.
This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook
registration until the functionality is requested within a network namespace
instead of module load time for all namespaces.
Hooks are only un-registered on module unload or when a namespace that used
such defrag functionality exits.
We have to use struct net for this as the register hooks can be called
before netns initialization here from the ipv4/ipv6 conntrack module
init path.
There is no unregister functionality support, defrag will always be
active once it was requested inside a net namespace.
The reason is that defrag has impact on nft and iptables rulesets
(without defrag we might see framents).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit 0848f6428ba3 ("inet: frags: fix defragmented packet's IP
header for af_packet"), ip_send_check() would be called twice for
defragmentation that occurs from netfilter ipv4 defrag hooks. Remove the
extra call.
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.
While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.
kernel BUG at net/ipv4/ip_output.c:586!
[...]
Call Trace:
<IRQ>
[<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
[<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
[<ffffffff81667830>] ? dst_discard_out+0x20/0x20
[<ffffffff81667810>] ? dst_ifdown+0x80/0x80
[<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
[<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
[<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
[<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
[<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
[<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
[<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
[<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
[<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
[<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
[<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
[<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
[<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
[<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
[<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
[<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
[<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
[<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
[<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
[<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
[<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
[<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
[<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
[<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
[<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
[<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
[<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
[<ffffffff81661060>] dev_queue_xmit+0x10/0x20
[<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
[<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
[<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
[<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
[<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
[<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
[<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
[<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
[<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
[<ffffffff816ab4c0>] ip_output+0x70/0x110
[<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
[<ffffffff816aa9f9>] ip_local_out+0x39/0x70
[<ffffffff816abf89>] ip_send_skb+0x19/0x40
[<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
[<ffffffff816df21a>] icmp_push_reply+0xea/0x120
[<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
[<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
[<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
[<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
[<ffffffff816dfa06>] icmp_echo+0x36/0x70
[<ffffffff816e0d11>] icmp_rcv+0x271/0x450
[<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
[<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
[<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
[<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
[<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
[<ffffffff816a5453>] ip_rcv+0x283/0x3e0
[<ffffffff810b6302>] ? match_held_lock+0x192/0x200
[<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
[<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
[<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
[<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
[<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
[<ffffffff8165e678>] process_backlog+0x78/0x230
[<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
[<ffffffff8165e355>] net_rx_action+0x155/0x400
[<ffffffff8106b48c>] __do_softirq+0xcc/0x420
[<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
[<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff8106b88e>] do_softirq+0x4e/0x60
[<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
[<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
[<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
[<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
[<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
[<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
[<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
[<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
[<ffffffff816ab4c0>] ip_output+0x70/0x110
[<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
[<ffffffff816aa9f9>] ip_local_out+0x39/0x70
[<ffffffff816abf89>] ip_send_skb+0x19/0x40
[<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
[<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
[<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
[<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
[<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
[<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
[<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
[<ffffffff8163e398>] sock_sendmsg+0x38/0x50
[<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
[<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
[<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
[<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
[<ffffffff81204886>] ? __fget_light+0x66/0x90
[<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
[<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
[<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
RIP [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
RSP <ffff88006d603170>
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before converting a 'socket pointer' into inet socket,
use sk_fullsock() to detect timewait or request sockets.
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove
br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve
netns support in the network stack that reached upstream via David's
net-next tree.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/bridge/br_netfilter_hooks.c
|
|
since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on
nf_unregister_hook") all pending queued entries are discarded.
So we can simply remove all of the owner handling -- when module is
removed it also needs to unregister all its hooks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.
So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only pass the void *priv parameter out of the nf_hook_ops. That is
all any of the functions are interested now, and by limiting what is
passed it becomes simpler to change implementation details.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The values of nf_hook_state.hook and nf_hook_ops.hooknum must be the
same by definition.
We are more likely to access the fields in nf_hook_state over the
fields in nf_hook_ops so with a little luck this results in
fewer cache line misses, and slightly more consistent code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.
No functional changes in this patch.
The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can use union for most of the temporary cruft (original ipv4/ipv6
address, source mac, physoutdev) since they're used during different
stages of br netfilter traversal.
Also get rid of the last two ->mask users.
Shrinks struct from 48 to 32 on 64bit arch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pass the nf_hook_state all the way down into the hook
functions themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.
Use IS_ENABLED instead of ifdef to cover the module case.
Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
replace:
#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
with
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
replace:
#if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
with
#if !IS_ENABLED(CONFIG_NF_NAT)
replace:
#if !defined(CONFIG_NF_CONNTRACK) && !defined(CONFIG_NF_CONNTRACK_MODULE)
with
#if !IS_ENABLED(CONFIG_NF_CONNTRACK)
And add missing:
IS_ENABLED(CONFIG_NF_CT_NETLINK)
in net/ipv{4,6}/netfilter/nf_nat_l3proto_ipv{4,6}.c
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
else we may fail to forward skb even if original fragments do fit
outgoing link mtu:
1. remote sends 2k packets in two 1000 byte frags, DF set
2. we want to forward but only see '2k > mtu and DF set'
3. we then send icmp error saying that outgoing link is 1500
But original sender never sent a packet that would not fit
the outgoing link.
Setting local_df makes outgoing path test size vs.
IPCB(skb)->frag_max_size, so we will still send the correct
error in case the largest original size did not fit
outgoing link mtu.
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Suggested-by: Maxime Bizon <mbizon@freebox.fr>
Fixes: 5f2d04f1f9 (ipv4: fix path MTU discovery with connection tracking)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pass the hook ops to the hookfn to allow for generic hook
functions. This change is required by nf_tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
|
|
There are enough instances of this:
iph->frag_off & htons(IP_MF | IP_OFFSET)
that a helper function is probably warranted.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation
of nf_defrag_ipv4 fails with:
include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct'
include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put'
include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct'
net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a
few #ifdefs. Long term the header file should be fixed to be usable
even with NF_CONNTRACK=n.
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.
Example:
iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
net/ipv4/netfilter/nf_defrag_ipv4.c:62: error: implicit declaration of function 'nf_ct_is_template'
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Support initializing selected parameters of new conntrack entries from a
"conntrack template", which is a specially marked conntrack entry attached
to the skb.
Currently the helper and the event delivery masks can be initialized this
way.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack
and a reassembly queue with the same fragment key already exists from
reassembling a similar packet received on a different device (f.i. with
multicasted fragments), the reassembled packet might continue on a different
codepath than where the head fragment originated. This can cause crashes
in bridge netfilter when a fragment received on a non-bridge device (and
thus with skb->nf_bridge == NULL) continues through the bridge netfilter
code.
Add a new reassembly identifier for packets originating from bridge
netfilter and use it to put those packets in insolated queues.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805
Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Nir Tzachar <nir.tzachar@gmail.com> reported a warning when sending
fragments over loopback with NAT:
[ 6658.338121] WARNING: at net/ipv4/netfilter/nf_nat_standalone.c:89 nf_nat_fn+0x33/0x155()
The reason is that defragmentation is skipped for already tracked connections.
This is wrong in combination with NAT and ip_conntrack actually had some ifdefs
to avoid this behaviour when NAT is compiled in.
The entire "optimization" may seem a bit silly, for now simply restoring the
lost #ifdef is the easiest solution until we can come up with something better.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Netfilter connection tracking requires all IPv4 packets to be defragmented.
Both the socket match and the TPROXY target depend on this functionality, so
this patch separates the Netfilter IPv4 defrag hooks into a separate module.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|