aboutsummaryrefslogtreecommitdiffstats
path: root/include/net/net_namespace.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-10-14netfilter: nf_tables: complete net namespace supportPablo Neira Ayuso1-0/+4
Register family per netnamespace to ensure that sets are only visible in its approapriate namespace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/usb/qmi_wwan.c drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h include/net/netfilter/nf_conntrack_synproxy.h include/net/secure_seq.h The conflicts are of two varieties: 1) Conflicts with Joe Perches's 'extern' removal from header file function declarations. Usually it's an argument signature change or a function being added/removed. The resolutions are trivial. 2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds a new value, another changes an existing value. That sort of thing. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28net: Delay default_device_exit_batch until no devices are unregistering v2Eric W. Biederman1-0/+1
There is currently serialization network namespaces exiting and network devices exiting as the final part of netdev_run_todo does not happen under the rtnl_lock. This is compounded by the fact that the only list of devices unregistering in netdev_run_todo is local to the netdev_run_todo. This lack of serialization in extreme cases results in network devices unregistering in netdev_run_todo after the loopback device of their network namespace has been freed (making dst_ifdown unsafe), and after the their network namespace has exited (making the NETDEV_UNREGISTER, and NETDEV_UNREGISTER_FINAL callbacks unsafe). Add the missing serialization by a per network namespace count of how many network devices are unregistering and having a wait queue that is woken up whenever the count is decreased. The count and wait queue allow default_device_exit_batch to wait until all of the unregistration activity for a network namespace has finished before proceeding to unregister the loopback device and then allowing the network namespace to exit. Only a single global wait queue is used because there is a single global lock, and there is a single waiter, per network namespace wait queues would be a waste of resources. The per network namespace count of unregistering devices gives a progress guarantee because the number of network devices unregistering in an exiting network namespace must ultimately drop to zero (assuming network device unregistration completes). The basic logic remains the same as in v1. This patch is now half comment and half rtnl_lock_unregistering an expanded version of wait_event performs no extra work in the common case where no network devices are unregistering when we get to default_device_exit_batch. Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21net_namespace.h: Remove extern from function prototypesJoe Perches1-14/+14
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net: split rt_genid for ipv4 and ipv6fan.du1-5/+32
Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-26netns: exclude ipvs from struct net when IPVS disabledJunweiZhang1-0/+2
no real problem is fixed, just save a few bytes in net_namespace structure. Signed-off-by: JunweiZhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-03ipv4: use separate genid for next hop exceptionsTimo Teräs1-0/+11
commit 13d82bf5 (ipv4: Fix flushing of cached routing informations) added the support to flush learned pmtu information. However, using rt_genid is quite heavy as it is bumped on route add/change and multicast events amongst other places. These can happen quite often, especially if using dynamic routing protocols. While this is ok with routes (as they are just recreated locally), the pmtu information is learned from remote systems and the icmp notification can come with long delays. It is worthy to have separate genid to avoid excessive pmtu resets. Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05netfilter: make /proc/net/netfilter pernetGao feng1-0/+2
This patch makes this proc dentry pernet. So far only init_net had a /proc/net/netfilter directory. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-11-20proc: Usable inode numbers for the namespace file descriptors.Eric W. Biederman1-0/+2
Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18userns: make each net (net_ns) belong to a user_nsEric W. Biederman1-2/+7
The user namespace which creates a new network namespace owns that namespace and all resources created in it. This way we can target capability checks for privileged operations against network resources to the user_ns which created the network namespace in which the resource lives. Privilege to the user namespace which owns the network namespace, or any parent user namespace thereof, provides the same privilege to the network resource. This patch is reworked from a version originally by Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NSEric W. Biederman1-6/+9
The copy of copy_net_ns used when the network stack is not built is broken as it does not return -EINVAL when attempting to create a new network namespace. We don't even have a previous network namespace. Since we need a copy of copy_net_ns in net/net_namespace.h that is available when the networking stack is not built at all move the correct version of copy_net_ns from net_namespace.c into net_namespace.h Leaving us with just 2 versions of copy_net_ns. One version for when we compile in network namespace suport and another stub for all other occasions. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-10-06sections: fix section conflicts in netAndi Kleen1-0/+2
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+10
Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19ipv6: add a new namespace for nf_conntrack_reasmAmerigo Wang1-0/+3
As pointed by Michal, it is necessary to add a new namespace for nf_conntrack_reasm code, this prepares for the second patch. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18netns: move net->ipv4.rt_genid to net->rt_genidNicolas Dichtel1-0/+10
This commit prepares the use of rt_genid by both IPv4 and IPv6. Initialization is left in IPv4 part. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14sctp: Make the address lists per network namespaceEric W. Biederman1-0/+4
- Move the address lists into struct net - Add per network namespace initialization and cleanup - Pass around struct net so it is everywhere I need it. - Rename all of the global variable references into references to the variables moved into struct net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Loopback ifindex is constant nowPavel Emelyanov1-0/+7
As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Make ifindex generation per-net namespacePavel Emelyanov1-0/+1
Strictly speaking this is only _really_ required for checkpoint-restore to make loopback device always have the same index. This change appears to be safe wrt "ifindex should be unique per-system" concept, as all the ifindex usage is either already made per net namespace of is explicitly limited with init_net only. There are two cool side effects of this. The first one -- ifindices of devices in container are always small, regardless of how many containers we've started (and re-started) so far. The second one is -- we can speed up the loopback ifidex access as shown in the next patch. v2: Place ifindex right after dev_base_seq : avoid two holes and use the same cache line, dirtied in list_netdevice()/unlist_netdevice() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16net: make sock diag per-namespaceAndrey Vagin1-0/+1
Before this patch sock_diag works for init_net only and dumps information about sockets from all namespaces. This patch expands sock_diag for all name-spaces. It creates a netlink kernel socket for each netns and filters data during dumping. v2: filter accoding with netns in all places remove an unused variable. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23net sysctl: Add place holder functions for when sysctl support is compiled out of the kernel.Eric W. Biederman1-3/+12
Randy Dunlap <rdunlap@xenotime.net> reported: > On 04/23/2012 12:07 AM, Stephen Rothwell wrote: > >> Hi all, >> >> Changes since 20120420: > > > > ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined! > ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined! > > when CONFIG_SYSCTL is not enabled. Add static inline stub functions to gracefully handle the case when sysctl support is not present. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Remove register_net_sysctl_tableEric W. Biederman1-3/+0
All of the users have been converted to use registera_net_sysctl so we no longer need register_net_sysctl. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Kill register_sysctl_rotableEric W. Biederman1-2/+0
register_sysctl_rotable never caught on as an interesting way to register sysctls. My take on the situation is that what we want are sysctls that we can only see in the initial network namespace. What we have implemented with register_sysctl_rotable are sysctls that we can see in all of the network namespaces and can only change in the initial network namespace. That is a very silly way to go. Just register the network sysctls in the initial network namespace and we don't have any weird special cases to deal with. The sysctls affected are: /proc/sys/net/ipv4/ipfrag_secret_interval /proc/sys/net/ipv4/ipfrag_max_dist /proc/sys/net/ipv6/ip6frag_secret_interval /proc/sys/net/ipv6/mld_max_msf I really don't expect anyone will miss them if they can't read them in a child user namespace. CC: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net sysctl: Initialize the network sysctls sooner to avoid problems.Eric W. Biederman1-0/+5
If the netfilter code is modified to use register_net_sysctl_table the kernel fails to boot because the per net sysctl infrasturce is not setup soon enough. So to avoid races call net_sysctl_init from sock_init(). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Implement register_net_sysctl.Eric W. Biederman1-0/+2
Right now all of the networking sysctl registrations are running in a compatibiity mode. The natvie sysctl registration api takes a cstring for a path and a simple ctl_table. Implement register_net_sysctl so that we can register network sysctls without needing to use compatiblity code in the sysctl core. Switching from a ctl_path to a cstring results in less boiler plate and denser code that is a little easier to read. I would simply have changed the arguments to register_net_sysctl_table instead of keeping two functions in parallel but gcc will allow a ctl_path pointer to be passed to a char * pointer with only issuing a warning resulting in completely incorrect code can be built. Since I have to change the function name I am taking advantage of the situation to let both register_net_sysctl and register_net_sysctl_table live for a short time in parallel which makes clean conversion patches a bit easier to read and write. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-11net: use IS_ENABLED(CONFIG_IPV6)Eric Dumazet1-1/+1
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma1-1/+1
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-01rtnl: provide link dump consistency infoThomas Graf1-0/+1
This patch adds a change sequence counter to each net namespace which is bumped whenever a netdevice is added or removed from the list. If such a change occurred while a link dump took place, the dump will have the NLM_F_DUMP_INTR flag set in the first message which has been interrupted and in all subsequent messages of the same dump. Note that links may still be modified or renamed while a dump is taking place but we can guarantee for userspace to receive a complete list of links and not miss any. Testing: I have added 500 VLAN netdevices to make sure the dump is split over multiple messages. Then while continuously dumping links in one process I also continuously deleted and re-added a dummy netdevice in another process. Multiple dumps per seconds have had the NLM_F_DUMP_INTR flag set. I guess we can wait for Johannes patch to hit net-next via the wireless tree. I just wanted to give this some testing right away. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-12Delay struct net freeing while there's a sysfs instance refering to itAl Viro1-1/+9
* new refcount in struct net, controlling actual freeing of the memory * new method in kobj_ns_type_operations (->drop_ns()) * ->current_ns() semantics change - it's supposed to be followed by corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps the new refcount; net_drop_ns() decrements it and calls net_free() if the last reference has been dropped. Method renamed to ->grab_current_ns(). * old net_free() callers call net_drop_ns() instead. * sysfs_exit_ns() is gone, along with a large part of callchain leading to it; now that the references stored in ->ns[...] stay valid we do not need to hunt them down and replace them with NULL. That fixes problems in sysfs_lookup() and sysfs_readdir(), along with getting rid of sb->s_instances abuse. Note that struct net *shutdown* logics has not changed - net_cleanup() is called exactly when it used to be called. The only thing postponed by having a sysfs instance refering to that struct net is actual freeing of memory occupied by struct net. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-0/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net: Kill ratelimit.h dependency in linux/net.h net: Add linux/sysctl.h includes where needed. net: Kill ether_table[] declaration. inetpeer: fix race in unused_list manipulations atm: expose ATM device index in sysfs IPVS: bug in ip_vs_ftp, same list heaad used in all netns. bug.h: Move ratelimit warn interfaces to ratelimit.h bonding: cleanup module option descriptions net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version. net: davinci_emac: fix dev_err use at probe can: convert to %pK for kptr_restrict support net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags netfilter: Fix several warnings in compat_mtw_from_user(). netfilter: ipset: fix ip_set_flush return code netfilter: ipset: remove unused variable from type_pf_tdel() netfilter: ipset: Use proper timeout value to jiffies conversion
2011-05-27net: Add linux/sysctl.h includes where needed.David S. Miller1-0/+1
Several networking headers were depending upon the implicit linux/sysctl.h include they get when including linux/net.h Add explicit includes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10net: Allow setting the network namespace by fdEric W. Biederman1-0/+1
Take advantage of the new abstraction and allow network devices to be placed in any network namespace that we have a fd to talk about. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-03-15ipvs: move struct netns_ipvsJulian Anastasov1-1/+1
Remove include/net/netns/ip_vs.h because it depends on structures from include/net/ip_vs.h. As ipvs is pointer in struct net it is better to move struct netns_ipvs into include/net/ip_vs.h, so that we can easily use other structures in struct netns_ipvs. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13IPVS: netns, add basic init per netns.Hans Schillstrom1-0/+2
Preparation for network name-space init, in this stage some empty functions exists. In most files there is a check if it is root ns i.e. init_net if (!net_eq(net, &init_net)) return ... this will be removed by the last patch, when enabling name-space. *v3 ip_vs_conn.c merge error corrected. net_ipvs #ifdef removed as sugested by Jan Engelhardt [ horms@verge.net.au: Removed whitespace-change-only hunks ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-25net_ns: add __rcu annotationsEric Dumazet1-1/+1
add __rcu annotation to (struct net)->gen, and use rcu_dereference_protected() in net_assign_generic() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17netns: reorder fields in struct netEric Dumazet1-7/+10
In a network bench, I noticed an unfortunate false sharing between 'loopback_dev' and 'count' fields in "struct net". 'count' is written each time a socket is created or destroyed, while loopback_dev might be often read in routing code. Move loopback_dev in a read mostly section of "struct net" Note: struct netns_xfrm is cache line aligned on SMP. (It contains a "struct dst_ops") Move it at the end to avoid holes, and reduce sizeof(struct net) by 128 bytes on ia32. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-12nsproxy: remove INIT_NSPROXY()Alexey Dobriyan1-5/+0
Remove INIT_NSPROXY(), use C99 initializer. Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it. Note: headers trim will be done later, now it's quite pointless because results will be invalidated by merge window. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-13netfilter: nfnetlink: netns supportAlexey Dobriyan1-0/+2
Make nfnl socket per-petns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-03net: Add support for batching network namespace cleanupsEric W. Biederman1-0/+2
- Add exit_list to struct net to support building lists of network namespaces to cleanup. - Add exit_batch to pernet_operations to allow running operations only once during a network namespace exit. Instead of once per network namespace. - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup up a network namespace does not need to be duplicated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: remove [un]register_pernet_gen_... and update the docs.Eric W. Biederman1-22/+0
No that all of the callers have been updated to set fields in struct pernet_operations, and simplified to let the network namespace core handle the allocation and freeing of the storage for them, remove the surpurpflous methods and update the docs to the new style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: Automatically allocate per namespace data.Eric W. Biederman1-4/+24
To get the full benefit of batched network namespace cleanup netowrk device deletion needs to be performed by the generic code. When using register_pernet_gen_device and freeing the data in exit_net it is impossible to delay allocation until after exit_net has called as the device uninit methods are no longer safe. To correct this, and to simplify working with per network namespace data I have moved allocation and deletion of per network namespace data into the network namespace core. The core now frees the data only after all of the network namespace exit routines have run. Now it is only required to set the new fields .id and .size in the pernet_operations structure if you want network namespace data to be managed for you automatically. This makes the current register_pernet_gen_device and register_pernet_gen_subsys routines unnecessary. For the moment I have left them as compatibility wrappers in net_namespace.h They will be removed once all of the users have been updated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: Batch network namespace destruction.Eric W. Biederman1-1/+1
It is fairly common to kill several network namespaces at once. Either because they are nested one inside the other or because they are cooperating in multiple machine networking experiments. As the network stack control logic does not parallelize easily batch up multiple network namespaces existing together. To get the full benefit of batching the virtual network devices to be removed must be all removed in one batch. For that purpose I have added a loop after the last network device operations have run that batches up all remaining network devices and deletes them. An extra benefit is that the reorganization slightly shrinks the size of the per network namespace data structures replaceing a work_struct with a list_head. In a trivial test with 4K namespaces this change reduced the cost of a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds (at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24rtnetlink: speedup rtnl_dump_ifinfo()Eric Dumazet1-0/+4
When handling large number of netdevice, rtnl_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table. This considerably speedups "ip link" operations Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07wext: refactorJohannes Berg1-1/+1
Refactor wext to * split out iwpriv handling * split out iwspy handling * split out procfs support * allow cfg80211 to have wireless extensions compat code w/o CONFIG_WIRELESS_EXT After this, drivers need to - select WIRELESS_EXT - for wext support - select WEXT_PRIV - for iwpriv support - select WEXT_SPY - for iwspy support except cfg80211 -- which gets new hooks in wext-core.c and can then get wext handlers without CONFIG_WIRELESS_EXT. Wireless extensions procfs support is auto-selected based on PROC_FS and anything that requires the wext core (i.e. WIRELESS_EXT or CFG80211_WEXT). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-20net: explain netns notifiers a little betterJohannes Berg1-7/+9
Eric explained this to me -- and afterwards the comment made sense, but not before. Add the the critical point about interfaces having to be gone from the netns before subsys notifiers are called. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-15wireless extensions: make netns awareJohannes Berg1-0/+3
This makes wireless extensions netns aware. The tasklet sending the events is converted to a work struct so that we can rtnl_lock() in it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12net: move and export get_net_ns_by_pidJohannes Berg1-0/+2
The function get_net_ns_by_pid(), to get a network namespace from a pid_t, will be required in cfg80211 as well. Therefore, let's move it to net_namespace.c and export it. We can't make it a static inline in the !NETNS case because it needs to verify that the given pid even exists (and return -ESRCH). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12genetlink: make netns awareJohannes Berg1-0/+2
This makes generic netlink network namespace aware. No generic netlink families except for the controller family are made namespace aware, they need to be checked one by one and then set the family->netnsok member to true. A new function genlmsg_multicast_netns() is introduced to allow sending a multicast message in a given namespace, for example when it applies to an object that lives in that namespace, a new function genlmsg_multicast_allns() to send a message to all network namespaces (for objects that do not have an associated netns). The function genlmsg_multicast() is changed to multicast the message in just init_net, which is currently correct for all generic netlink families since they only work in init_net right now. Some will later want to work in all net namespaces because they do not care about the netns at all -- those will have to be converted to use one of the new functions genlmsg_multicast_allns() or genlmsg_multicast_netns() whenever they are made netns aware in some way. After this patch families can easily decide whether or not they should be available in all net namespaces. Many genl families us it for objects not related to networking and should therefore be available in all namespaces, but that will have to be done on a per family basis. Note that this doesn't touch on the checkpoint/restart problem where network namespaces could be used, genl families and multicast groups are numbered globally and I see no easy way of changing that, especially since it must be possible to multicast to all network namespaces for those families that do not care about netns. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12net: make namespace iteration possible under RCUJohannes Berg1-0/+3
All we need to take care of is using proper RCU list add/del primitives and inserting a synchronize_rcu() at one place to make sure the exit notifiers are run after everybody has stopped iterating the list. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03netns: Remove net_aliveEric W. Biederman1-10/+17
It turns out that net_alive is unnecessary, and the original problem that led to it being added was simply that the icmp code thought it was a network device and wound up being unable to handle packets while there were still packets in the network namespace. Now that icmp and tcp have been fixed to properly register themselves this problem is no longer present and we have a stronger guarantee that packets will not arrive in a network namespace then that provided by net_alive in netif_receive_skb. So remove net_alive allowing packet reception run a little faster. Additionally document the strong reason why network namespace cleanup is safe so that if something happens again someone else will have a chance of figuring it out. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25netns xfrm: add netns boilerplateAlexey Dobriyan1-0/+4
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>