aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/core.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-09-19net: Add _nf_(un)register_hooks symbolsMahesh Bandewar1-5/+46
Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow caller to hold RTNL mutex. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16netfilter: make nf_queue_entry_get_refs return voidFlorian Westphal1-2/+0
We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-13netfilter: sync with packet rx also after removing queue entriesFlorian Westphal1-0/+2
We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack infoKen-ichirou MATSUZAWA1-2/+2
The idea of this series of patch is to attach conntrack information to nflog like nfqueue has already done. nfqueue conntrack info attaching basis is generic, rename those names to generic one, glue. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-04netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.cPablo Neira Ayuso1-3/+6
The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: Pass priv instead of nf_hook_ops to netfilter hooksEric W. Biederman1-1/+1
Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-02netfilter: nf_conntrack: make nf_ct_zone_dflt built-inDaniel Borkmann1-0/+6
Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28netfilter: reduce sparse warningsFlorian Westphal1-3/+0
bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers) -> remove __pure annotation. ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16 -> switch ntohs to htons and vice versa. netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static? -> delete it, got removed net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32 -> Use __be32 instead of u32. Tested with objdiff that these changes do not affect generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-23netfilter: rename local nf_hook_list to hook_listPablo Neira Ayuso1-14/+14
085db2c04557 ("netfilter: Per network namespace netfilter hooks.") introduced a new nf_hook_list that is global, so let's avoid this overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-23netfilter: fix possible removal of wrong hookPablo Neira Ayuso1-22/+19
nf_unregister_net_hook() uses the nf_hook_ops fields as tuple to look up for the corresponding hook in the list. However, we may have two hooks with exactly the same configuration. This shouldn't be a problem for nftables since every new chain has an unique priv field set, but this may still cause us problems in the future, so better address this problem now by keeping a reference to the original nf_hook_ops structure to make sure we delete the right hook from nf_unregister_net_hook(). Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-23netfilter: nf_queue: fix nf_queue_nf_hook_drop()Pablo Neira Ayuso1-1/+1
This function reacquires the rtnl_lock() which is already held by nf_unregister_hook(). This can be triggered via: modprobe nf_conntrack_ipv4 && rmmod nf_conntrack_ipv4 [ 720.628746] INFO: task rmmod:3578 blocked for more than 120 seconds. [ 720.628749] Not tainted 4.2.0-rc2+ #113 [ 720.628752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 720.628754] rmmod D ffff8800ca46fd58 0 3578 3571 0x00000080 [...] [ 720.628783] Call Trace: [ 720.628790] [<ffffffff8152ea0b>] schedule+0x6b/0x90 [ 720.628795] [<ffffffff8152ecb3>] schedule_preempt_disabled+0x13/0x20 [ 720.628799] [<ffffffff8152ff55>] mutex_lock_nested+0x1f5/0x380 [ 720.628803] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628807] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628812] [<ffffffff81462622>] rtnl_lock+0x12/0x20 [ 720.628817] [<ffffffff8148ab25>] nf_queue_nf_hook_drop+0x15/0x160 [ 720.628825] [<ffffffff81488d48>] nf_unregister_net_hook+0x168/0x190 [ 720.628831] [<ffffffff81488e24>] nf_unregister_hook+0x64/0x80 [ 720.628837] [<ffffffff81488e60>] nf_unregister_hooks+0x20/0x30 [...] Moreover, nf_unregister_net_hook() should only destroy the queue for this netns, not for every netns. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-20netfilter: Fix memory leak in nf_register_net_hookEric W. Biederman1-1/+3
In the rare case that when it is a attempted to use a per network device netfilter hook and the network device does not exist the newly allocated structure can leak. Be a good citizen and free the newly allocated structure in the error handling code. Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Reported-by: kbuild@01.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: move tee_active to coreFlorian Westphal1-0/+3
This prepares for a TEE like expression in nftables. We want to ensure only one duplicate is sent, so both will use the same percpu variable to detect duplication. The other use case is detection of recursive call to xtables, but since we don't want dependency from nft to xtables core its put into core.c instead of the x_tables core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: Per network namespace netfilter hooks.Eric W. Biederman1-21/+161
- Add a new set of functions for registering and unregistering per network namespace hooks. - Modify the old global namespace hook functions to use the per network namespace hooks in their implementation, so their remains a single list that needs to be walked for any hook (this is important for keeping the hook priority working and for keeping the code walking the hooks simple). - Only allow registering the per netdevice hooks in the network namespace where the network device lives. - Dynamically allocate the structures in the per network namespace hook list in nf_register_net_hook, and unregister them in nf_unregister_net_hook. Dynamic allocate is required somewhere as the number of network namespaces are not fixed so we might as well allocate them in the registration function. The chain of registered hooks on any list is expected to be small so the cost of walking that list to find the entry we are unregistering should also be small. Performing the management of the dynamically allocated list entries in the registration and unregistration functions keeps the complexity from spreading. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-15netfilter: Factor out the hook list selection from nf_register_hookEric W. Biederman1-14/+18
- Add a new function find_nf_hook_list to select the nf_hook_list - Fail nf_register_hook when asked for a per netdevice hook list when support for per netdevice hook lists is not built into the kernel. - Move the hook list head selection outside of nf_hook_mutex as nothing in the selection requires the hook list, and error handling is simpler if a mutex is not held. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: Simply the tests for enabling and disabling the ingress queue hookEric W. Biederman1-11/+6
Replace an overcomplicated switch statement with a simple if statement. This also removes the ingress queue enable outside of nf_hook_mutex as the protection provided by the mutex is not necessary and the code is clearer having both of the static key increments together. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-23netfilter: nf_qeueue: Drop queue entries on nf_unregister_hookEric W. Biederman1-0/+1
Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add netfilter ingress hook after handle_ing() under unique static keyPablo Neira1-1/+30
This patch adds the Netfilter ingress hook just after the existing tc ingress hook, that seems to be the consensus solution for this. Note that the Netfilter hook resides under the global static key that enables ingress filtering. Nonetheless, Netfilter still also has its own static key for minimal impact on the existing handle_ing(). * Without this patch: Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags) 16086246pps 7721Mb/sec (7721398080bps) errors: 100000000 42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch: Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags) 16090536pps 7723Mb/sec (7723457280bps) errors: 100000000 41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb * Without this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags) 10788648pps 5178Mb/sec (5178551040bps) errors: 100000000 40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.77% kpktgend_0 [cls_u32] [k] u32_classify 5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker 3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify 2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags) 10743194pps 5156Mb/sec (5156733120bps) errors: 100000000 42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.70% kpktgend_0 [cls_u32] [k] u32_classify 5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker 2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify 1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk Note that the results are very similar before and after. I can see gcc gets the code under the ingress static key out of the hot path. Then, on that cold branch, it generates the code to accomodate the netfilter ingress static key. My explanation for this is that this reduces the pressure on the instruction cache for non-users as the new code is out of the hot path, and it comes with minimal impact for tc ingress users. Using gcc version 4.8.4 on: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 [...] L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 8192K Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add hook list to nf_hook_statePablo Neira1-4/+2
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Make nf_hookfn use nf_hook_state.David S. Miller1-2/+1
Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Create and use nf_hook_state.David S. Miller1-19/+13
Instead of passing a large number of arguments down into the nf_hook() entry points, create a structure which carries this state down through the hook processing layers. This makes is so that if we want to change the types or signatures of any of these pieces of state, there are less places that need to be changed. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13netfilter: fix various sparse warningsFlorian Westphal1-0/+1
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static? no; add include net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static? yes net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static? no; add include net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static? yes net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static? no; add include net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3) net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3) add __force, 3 is what we want. net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static? yes net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static? no; add include Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-25netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABELZhouyi Zhou1-3/+3
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-08netfilter: don't use mutex_lock_interruptible()Pablo Neira Ayuso1-9/+2
Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>
2013-10-14netfilter: pass hook ops to hookfnPatrick McHardy1-1/+1
Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy1-3/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-6/+15
Conflicts: net/netfilter/nf_log.c The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS protection around foo_proc_entry() calls to fix a build failure, whereas in Pablo's tree a guard if() test around a call is remove_proc_entry() was removed. Trivially resolved. Pablo Neira Ayuso says: ==================== The following patchset contains the first batch of Netfilter/IPVS updates for your net-next tree, they are: * Three patches with improvements and code refactorization for nfnetlink_queue, from Florian Westphal. * FTP helper now parses replies without brackets, as RFC1123 recommends, from Jeff Mahoney. * Rise a warning to tell everyone about ULOG deprecation, NFLOG has been already in the kernel tree for long time and supersedes the old logging over netlink stub, from myself. * Don't panic if we fail to load netfilter core framework, just bail out instead, from myself. * Add cond_resched_rcu, used by IPVS to allow rescheduling while walking over big hashtables, from Simon Horman. * Change type of IPVS sysctl_sync_qlen_max sysctl to avoid possible overflow, from Zhang Yanfei. * Use strlcpy instead of strncpy to skip zeroing of already initialized area to write the extension names in ebtables, from Chen Gang. * Use already existing per-cpu notrack object from xt_CT, from Eric Dumazet. * Save explicit socket lookup in xt_socket now that we have early demux, also from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23netfilter: don't panic on error while walking through the init pathPablo Neira Ayuso1-6/+15
Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6Florian Westphal1-0/+2
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-18netfilter: add my copyright statementsPatrick McHardy1-0/+1
Add copyright statements to all netfilter files which have had significant changes done by myself in the past. Some notes: - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter Core Team when it got split out of nf_conntrack_core.c. The copyrights even state a date which lies six years before it was written. It was written in 2005 by Harald and myself. - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright statements. I've added the copyright statement from net/netfilter/core.c, where this code originated - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want it to give the wrong impression Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: remove unneeded variable proc_net_netfilterPablo Neira Ayuso1-12/+4
Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: make /proc/net/netfilter pernetGao feng1-4/+29
This patch makes this proc dentry pernet. So far only init_net had a /proc/net/netfilter directory. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03netfilter: kill support for per-af queue backendsFlorian Westphal1-2/+0
We used to have several queueing backends, but nowadays only nfnetlink_queue remains. In light of this there doesn't seem to be a good reason to support per-af registering -- just hook up nfnetlink_queue on module load and remove it on unload. This means that the userspace BIND/UNBIND_PF commands are now obsolete; the kernel will ignore them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()Michael Wang1-2/+2
Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_queue() and __nf_queue() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()Michael Wang1-14/+10
Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_iterate() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30netfilter: add protocol independent NAT corePatrick McHardy1-0/+5
Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-22netfilter: replace list_for_each_continue_rcu with new interfaceMichael Wang1-4/+6
This patch replaces list_for_each_continue_rcu() with list_for_each_entry_continue_rcu() to allow removing list_for_each_continue_rcu(). Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-22netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and CONFIG_NF_CT_NETLINK=yPablo Neira Ayuso1-0/+3
LD init/built-in.o net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust' make: *** [vmlinux] Error 1 This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing in Netfilter to solve our complicated configuration dependencies. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-20netfilter: nfq_ct_hook needs __rcu and __read_mostlyPablo Neira Ayuso1-1/+1
This removes some sparse warnings. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16netfilter: add glue code to integrate nfnetlink_queue and ctnetlinkPablo Neira Ayuso1-0/+4
This patch allows you to include the conntrack information together with the packet that is sent to user-space via NFQUEUE. Previously, there was no integration between ctnetlink and nfnetlink_queue. If you wanted to access conntrack information from your libnetfilter_queue program, you required to query ctnetlink from user-space to obtain it. Thus, delaying the packet processing even more. Including the conntrack information is optional, you can set it via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-20net: Delete all remaining instances of ctl_pathEric W. Biederman1-9/+0
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()Ingo Molnar1-3/+3
So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-16net:netfilter: use IS_ENABLEDIgor Maravić1-1/+1
Use IS_ENABLED(CONFIG_FOO) instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE) Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-21netfilter: use jump_label for nf_hooksEric Dumazet1-1/+12
On configs where CONFIG_JUMP_LABEL=y, we can replace in fast path a load/compare/conditional jump by a single jump with no dcache reference. Jump target is modified as soon as nf_hooks[pf][hook] switches from empty state to non empty states. jump_label state is kept outside of nf_hooks array so has no cost on cpu caches. This patch removes the test on CONFIG_NETFILTER_DEBUG : No need to call nf_hook_slow() at all if nf_hooks[pf][hook] is empty, this didnt give useful information, but slowed down things a lot. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01netfilter: do not propagate nf_queue errors in nf_hook_slowFlorian Westphal1-6/+5
commit f15850861860636c905b33a9a5be3dcbc2b0d56a (netfilter: nfnetlink_queue: return error number to caller) erronously assigns the return value of nf_queue() to the "ret" value. This can cause bogus return values if we encounter QUEUE verdict when bypassing is enabled, the listener does not exist and the next hook returns NF_STOLEN. In this case nf_hook_slow returned -ESRCH instead of 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-08-02rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTERStephen Hemminger1-2/+2
When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-1/+2
Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c
2011-02-14netfilter: nf_iterate: fix incorrect RCU usagePatrick McHardy1-1/+2
As noticed by Eric, nf_iterate doesn't use RCU correctly by accessing the prev pointer of a RCU protected list element when a verdict of NF_REPEAT is issued. Fix by jumping backwards to the hook invocation directly instead of loading the previous list element before continuing the list iteration. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: allow NFQUEUE bypass if no listener is availableFlorian Westphal1-0/+3
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the packet is dropped. This adds a v2 target revision of xt_NFQUEUE that allows packets to continue through the ruleset instead. Because the actual queueing happens outside of the target context, the 'bypass' flag has to be communicated back to the netfilter core. Unfortunately the only choice to do this without adding a new function argument is to use the target function return value (i.e. the verdict). In the NF_QUEUE case, the upper 16bit already contain the queue number to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e. we now have extra room for a new flag. If a hook issued a NF_QUEUE verdict, then the netfilter core will continue packet processing if the queueing hook returns -ESRCH (== "this queue does not exist") and the new NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value. Note: If the queue exists, but userspace does not consume packets fast enough, the skb will still be dropped. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>