linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2007-11-15	[INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue	Pavel Emelyanov	1	-17/+1
	The request_sock_queue's listen_opt is either vmalloc-ed or kmalloc-ed depending on the number of table entries. Thus it is expected to be handled properly on free, which is done in the reqsk_queue_destroy(). However the error path in inet_csk_listen_start() calls the lite version of reqsk_queue_destroy, called __reqsk_queue_destroy, which calls the kfree unconditionally. Fix this and move the __reqsk_queue_destroy into a .c file as it looks too big to be inline. As David also noticed, this is an error recovery path only, so no locking is required and the lopt is known to be not NULL. reqsk_queue_yank_listen_sk is also now only used in net/core/request_sock.c so we should move it there too. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-14	[TCP]: Fix size calculation in sk_stream_alloc_pskb	Herbert Xu	1	-4/+6
	We round up the header size in sk_stream_alloc_pskb so that TSO packets get zero tail room. Unfortunately this rounding up is not coordinated with the select_size() function used by TCP to calculate the second parameter of sk_stream_alloc_pskb. As a result, we may allocate more than a page of data in the non-TSO case when exactly one page is desired. In fact, rounding up the head room is detrimental in the non-TSO case because it makes memory that would otherwise be available to the payload head room. TSO doesn't need this either, all it wants is the guarantee that there is no tail room. So this patch fixes this by adjusting the skb_reserve call so that exactly the requested amount (which all callers have calculated in a precise way) is made available as tail room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-13	[NET]: Move unneeded data to initdata section.	Denis V. Lunev	1	-0/+2
	This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35 It diets .text & .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Signed-of-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12	[INET]: Use list_head-s in inetpeer.c	Pavel Emelyanov	1	-1/+1
	The inetpeer.c tracks the LRU list of inet_perr-s, but makes it by hands. Use the list_head-s for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12	[INET]: Remove leftover prototypes from include/net/inet_common.h	Arnaldo Carvalho de Melo	1	-4/+0
	Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12	Merge branch 'pending' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev	David S. Miller	4	-13/+18

2007-11-10	[INET]: Small possible memory leak in FIB rules	Denis V. Lunev	1	-0/+3
	This patch fixes a small memory leak. Default fib rules can be deleted by the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by ip rule flush Such a rule will not be freed as the ref-counter has 2 on start and becomes clearly unreachable after removal. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10	[AF_UNIX]: Make unix_tot_inflight counter non-atomic	Pavel Emelyanov	1	-1/+1
	This counter is _always_ modified under the unix_gc_lock spinlock, so its atomicity can be provided w/o additional efforts. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10	mac80211: remove unused driver ops	Johannes Berg	1	-21/+0
	The driver operations set_ieee8021x(), set_port_auth() and set_privacy_invoked() are not used by any drivers, except set_privacy_invoked() they aren't even used by mac80211. Remove them at least until we need to support drivers with mac80211 that require getting this information. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-10	mac80211: allow driver to ask for a rate control algorithm	Johannes Berg	1	-0/+5
	This allows a driver to ask for a specific rate control algorithm. The rate control algorithm asked for must be registered and be available as a module or built-in. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-10	[NET]: Make helper to get dst entry and "use" it	Pavel Emelyanov	1	-0/+7
	There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10	[INET]: Add a missing include <linux/vmalloc.h> to inet_hashtables.h	Eric Dumazet	1	-0/+1
	Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-09	SCTP: Clean-up some defines for regressions tests.	Vlad Yasevich	1	-1/+0
	Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-09	SCTP: Make sctp_verify_param return multiple indications.	Vlad Yasevich	1	-0/+2
	SCTP-AUTH and future ADD-IP updates have a requirement to do additional verification of parameters and an ability to ABORT the association if verification fails. So, introduce additional return code so that we can clear signal a required action. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-09	SCTP: Convert custom hash lists to use hlist.	Vlad Yasevich	2	-6/+7
	Convert the custom hash list traversals to use hlist functions. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07	SCTP: Allow ADD_IP to work with AUTH for backward compatibility.	Vlad Yasevich	1	-0/+2
	This patch adds a tunable that will allow ADD_IP to work without AUTH for backward compatibility. The default value is off since the default value for ADD_IP is off as well. People who need to use ADD-IP with older implementations take risks of connection hijacking and should consider upgrading or turning this tunable on. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07	SCTP: Correctly disable ADD-IP when AUTH is not supported.	Vlad Yasevich	1	-1/+0
	Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07	SCTP: Update RCU handling during the ADD-IP case	Vlad Yasevich	1	-3/+1
	After learning more about rcu, it looks like the ADD-IP hadling doesn't need to call call_rcu_bh. All the rcu critical sections use rcu_read_lock, so using call_rcu_bh is wrong here. Now, restore the local_bh_disable() code blocks and use normal call_rcu() calls. Also restore the missing return statement. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07	SCTP: Fix difference cases of retransmit.	Vlad Yasevich	4	-2/+6
	Commit d0ce92910bc04e107b2f3f2048f07e94f570035d broke several retransmit cases including fast retransmit. The reason is that we should only delay by rto while doing retranmists as a result of a timeout. Retransmit as a result of path mtu discover, fast retransmit, or other evernts that should trigger immidiate retransmissions got broken. Also, since rto is doubled prior to marking of packets elegable for retransmission, we never marked correct chunks anyway. The fix is provide a reason for a given retransmission so that we can mark chunks appropriately and to save the old rto value to do comparisons against. All regressions tests passed with this code. Spotted by Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07	[INET]: Remove per bucket rwlock in tcp/dccp ehash table.	Eric Dumazet	1	-6/+65
	As done two years ago on IP route cache table (commit 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07	[IPVS]: Synchronize closing of Connections	Rumen G. Bogdanovski	1	-0/+4
	This patch makes the master daemon to sync the connection when it is about to close. This makes the connections on the backup to close or timeout according their state. Before the sync was performed only if the connection is in ESTABLISHED state which always made the connections to timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch ([IPVS]: use proper timeout instead of fixed value) effectively did nothing more than increasing this to 15 minutes (Established state timeout). So this patch makes use of proper timeout since it syncs the connections on status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout). However if the backup misses CLOSE hopefully it did not miss FIN_WAIT. Otherwise we will just have to wait for the ESTABLISHED state timeout. As it is without this patch. This way the number of the hanging connections on the backup is kept to minimum. And very few of them will be left to timeout with a long timeout. This is important if we want to make use of the fix for the real server overcommit on master/backup fail-over. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07	[IPVS]: Bind connections on stanby if the destination exists	Rumen G. Bogdanovski	1	-0/+4
	This patch fixes the problem with node overload on director fail-over. Given the scenario: 2 nodes each accepting 3 connections at a time and 2 directors, director failover occurs when the nodes are fully loaded (6 connections to the cluster) in this case the new director will assign another 6 connections to the cluster, If the same real servers exist there. The problem turned to be in not binding the inherited connections to the real servers (destinations) on the backup director. Therefore: "ipvsadm -l" reports 0 connections: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 0 -> node484.local:5999 Route 1000 0 0 while "ipvs -lnc" is right root@test2:~# ipvsadm -lnc IPVS connection entries pro expire state source virtual destination TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999 192.168.0.51:5999 TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999 192.168.0.52:5999 So the patch I am sending fixes the problem by binding the received connections to the appropriate service on the backup director, if it exists, else the connection will be handled the old way. So if the master and the backup directors are synchronized in terms of real services there will be no problem with server over-committing since new connections will not be created on the nonexistent real services on the backup. However if the service is created later on the backup, the binding will be performed when the next connection update is received. With this patch the inherited connections will show as inactive on the backup: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 1 -> node484.local:5999 Route 1000 0 1 rumen@test2:~$ cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP C0A800DE:176F wlc -> C0A80033:176F Route 1000 0 1 -> C0A80032:176F Route 1000 0 1 Regards, Rumen Bogdanovski Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2007-11-07	[IPV4]: Compact some ifdefs in the fib code.	Pavel Emelyanov	1	-9/+6
	There are places that check for CONFIG_IP_MULTIPLE_TABLES twice in the same file, but the internals of these #ifdefs can be merged. As a side effect - remove one ifdef from inside a function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07	[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.	Eric Dumazet	1	-6/+57
	"struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-02	cleanup asm/scatterlist.h includes	Adrian Bunk	1	-1/+1
	Not architecture specific code should not #include <asm/scatterlist.h>. This patch therefore either replaces them with #include <linux/scatterlist.h> or simply removes them if they were unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-01	[NET]: Relax the reference counting of init_net_ns	Pavel Emelyanov	1	-8/+25
	When the CONFIG_NET_NS is n there's no need in refcounting the initial net namespace. So relax this code by making a stupid stubs for the "n" case. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01	[NET]: Forget the zero_it argument of sk_alloc()	Pavel Emelyanov	1	-1/+1
	Finally, the zero_it argument can be completely removed from the callers and from the function prototype. Besides, fix the checkpatch.pl warnings about using the assignments inside if-s. This patch is rather big, and it is a part of the previous one. I splitted it wishing to make the patches more readable. Hope this particular split helped. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01	[NET]: Move the sock_copy() from the header	Pavel Emelyanov	1	-14/+0
	The sock_copy() call is not used outside the sock.c file, so just move it into a sock.c Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-29	SCTP endianness annotations regression	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-26	[NET]: Marking struct pernet_operations __net_initdata was inappropriate	Eric W. Biederman	1	-2/+0
	It is not safe to to place struct pernet_operations in a special section. We need struct pernet_operations to last until we call unregister_pernet_subsys. Which doesn't happen until module unload. So marking struct pernet_operations is a disaster for modules in two ways. - We discard it before we call the exit method it points to. - Because I keep struct pernet_operations on a linked list discarding it for compiled in code removes elements in the middle of a linked list and does horrible things for linked insert. So this looks safe assuming __exit_refok is not discarded for modules. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26	[SCTP] net/sctp/auth.c: make 3 functions static	Adrian Bunk	1	-1/+0
	This patch makes three needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26	[SCTP]: #if 0 sctp_update_copy_cksum()	Adrian Bunk	1	-1/+0
	sctp_update_copy_cksum() is no longer used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26	[IRDA]: Make ircomm_tty static.	Adrian Bunk	1	-1/+0
	ircomm_tty can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26	[NET_CLS_ACT]: Introduce skb_act_clone	Jamal Hadi Salim	1	-0/+15
	Reworked skb_clone looks uglier with the single ifdef CONFIG_NET_CLS_ACT This patch introduces skb_act_clone which will replace skb_clone in tc actions Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-25	[UDP]: Make use of inet_iif() when doing socket lookups.	Vlad Yasevich	2	-6/+7
	UDP currently uses skb->dev->ifindex which may provide the wrong information when the socket bound to a specific interface. This patch makes inet_iif() accessible to UDP and makes UDP use it. The scenario we are trying to fix is when a client is running on the same system and the server and both client and server bind to a non-loopback device. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23	[NET]: Don't declare extern variables in net/core/sysctl_net_core.c	Pavel Emelyanov	1	-0/+2
	Some are already declared in include/linux/netdevice.h, while some others (xfrm ones) need to be declared. The driver/net/rrunner.c just uses same extern as well, so cleanup it also. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23	[TCP]: Remove unneeded implicit type cast when calling tcp_minshall_update()	Chuck Lever	1	-1/+1
	The tcp_minshall_update() function is called in exactly one place, and is passed an unsigned integer for the mss_len argument. Make the sign of the argument match the sign of the passed variable in order to eliminate an unneeded implicit type cast and a mixed sign comparison in tcp_minshall_update(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-22	[Bluetooth] Add support for handling simple eSCO links	Marcel Holtmann	1	-0/+1
	With the Bluetooth 1.2 specification the Extended SCO feature for better audio connections was introduced. So far the Bluetooth core wasn't able to handle any eSCO connections correctly. This patch adds simple eSCO support while keeping backward compatibility with older devices. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22	[Bluetooth] Fall back to L2CAP in basic mode	Marcel Holtmann	1	-0/+13
	In case the remote entity tries to negogiate retransmission or flow control mode, reject it and fall back to basic mode. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22	[Bluetooth] Retrieve L2CAP features mask on connection setup	Marcel Holtmann	1	-2/+12
	The Bluetooth 1.2 specification introduced a specific features mask value to interoperate with newer versions of the specification. So far this piece of information was never needed, but future extensions will rely on it. This patch adds a generic way to retrieve this information only once per connection setup. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22	[Bluetooth] Remove global conf_mtu variable from L2CAP	Marcel Holtmann	1	-5/+5
	After the change to the L2CAP configuration parameter handling the global conf_mtu variable is no longer needed and so remove it. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22	[Bluetooth] Switch from OGF+OCF to using only opcodes	Marcel Holtmann	2	-257/+359
	The Bluetooth HCI commands are divided into logical OGF groups for easier identification of their purposes. While this still makes sense for the written specification, its makes the code only more complex and harder to read. So instead of using separate OGF and OCF values to identify the commands, use a common 16-bit opcode that combines both values. As a side effect this also reduces the complexity of OGF and OCF calculations during command header parsing. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-19	Spelling fix: explicitly	Jean Delvare	1	-2/+2
	From: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19	Use helpers to obtain task pid in printks	Pavel Emelyanov	1	-2/+2
	The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19	pid namespaces: changes to show virtual ids to user	Pavel Emelyanov	1	-1/+3
	This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17	[IPSEC]: Rename mode to outer_mode and add inner_mode	Herbert Xu	1	-1/+2
	This patch adds a new field to xfrm states called inner_mode. The existing mode object is renamed to outer_mode. This is the first part of an attempt to fix inter-family transforms. As it is we always use the outer family when determining which mode to use. As a result we may end up shoving IPv4 packets into netfilter6 and vice versa. What we really want is to use the inner family for the first part of outbound processing and the outer family for the second part. For inbound processing we'd use the opposite pairing. I've also added a check to prevent silly combinations such as transport mode with inter-family transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17	[IPSEC]: Store afinfo pointer in xfrm_mode	Herbert Xu	1	-3/+3
	It is convenient to have a pointer from xfrm_state to address-specific functions such as the output function for a family. Currently the address-specific policy code calls out to the xfrm state code to get those pointers when we could get it in an easier way via the state itself. This patch adds an xfrm_state_afinfo to xfrm_mode (since they're address-specific) and changes the policy code to use it. I've also added an owner field to do reference counting on the module providing the afinfo even though it isn't strictly necessary today since IPv6 can't be unloaded yet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17	[IPSEC]: Add missing BEET checks	Herbert Xu	1	-0/+6
	Currently BEET mode does not reinject the packet back into the stack like tunnel mode does. Since BEET should behave just like tunnel mode this is incorrect. This patch fixes this by introducing a flags field to xfrm_mode that tells the IPsec code whether it should terminate and reinject the packet back into the stack. It then sets the flag for BEET and tunnel mode. I've also added a number of missing BEET checks elsewhere where we check whether a given mode is a tunnel or not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17	[IPSEC]: Move type and mode map into xfrm_state.c	Herbert Xu	1	-6/+2
	The type and mode maps are only used by SAs, not policies. So it makes sense to move them from xfrm_policy.c into xfrm_state.c. This also allows us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as static. The only other change I've made in the move is to get rid of the casts on the request_module call for types. They're unnecessary because C will promote them to ints anyway. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17	[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi	Herbert Xu	1	-1/+1
	Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet. This means that we need to fix up the value in case we have a 4-on-6 tunnel. Moving this logic into the caller simplifies things and allows us to merge the code with IPv4. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>