aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2006-12-02[TCP]: Fix some warning when MD5 is disabled.David S. Miller1-1/+1
Just some mis-placed ifdefs: net/ipv4/tcp_minisocks.c: In function ‘tcp_twsk_destructor’: net/ipv4/tcp_minisocks.c:364: warning: unused variable ‘twsk’ net/ipv6/tcp_ipv6.c:1846: warning: ‘tcp_sock_ipv6_specific’ defined but not used net/ipv6/tcp_ipv6.c:1877: warning: ‘tcp_sock_ipv6_mapped_specific’ defined but not used Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: MD5 Signature Option (RFC2385) support.YOSHIFUJI Hideaki6-36/+973
Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP/DCCP]: Introduce net_xmit_evalGerrit Renker2-9/+2
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02[TCP] htcp: Better packing of struct htcp.David S. Miller1-2/+2
Based upon a patch by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NETLINK]: Do precise netlink message allocations where possibleThomas Graf3-11/+51
Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: Remove dead code in init_sequenceGerrit Renker1-2/+2
This removes two redundancies: 1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence() is always true, due to * tcp_v6_conn_request() is the only function calling this one * tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()] 2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: Don't set SKB owner in tcp_transmit_skb().David S. Miller2-4/+2
The data itself is already charged to the SKB, doing the skb_set_owner_w() just generates a lot of noise and extra atomics we don't really need. Lmbench improvements on lat_tcp are minimal: before: TCP latency using localhost: 23.2701 microseconds TCP latency using localhost: 23.1994 microseconds TCP latency using localhost: 23.2257 microseconds after: TCP latency using localhost: 22.8380 microseconds TCP latency using localhost: 22.9465 microseconds TCP latency using localhost: 22.8462 microseconds Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: Allow autoloading of congestion control via setsockopt.Stephen Hemminger1-1/+11
If user has permision to load modules, then autoload then attempt autoload of TCP congestion module. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: Restrict congestion control choices.Stephen Hemminger2-0/+115
Allow normal users to only choose among a restricted set of congestion control choices. The default is reno and what ever has been configured as default. But the policy can be changed by administrator at any time. For example, to allow any choice: cp /proc/sys/net/ipv4/tcp_available_congestion_control \ /proc/sys/net/ipv4/tcp_allowed_congestion_control Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[TCP]: Add tcp_available_congestion_control sysctl.Stephen Hemminger2-0/+40
Create /proc/sys/net/ipv4/tcp_available_congestion_control that reflects currently available TCP choices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Size listen hash tables using backlog hintEric Dumazet3-5/+5
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET] rules: Share common attribute validation policyThomas Graf1-5/+1
Move the attribute policy for the non-specific attributes into net/fib_rules.h and include it in the respective protocols. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET] rules: Protocol independant mark selectorThomas Graf1-29/+0
Move mark selector currently implemented per protocol into the protocol independant part. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[IPV4] nl_fib_lookup: Rename fl_fwmark to fl_markThomas Graf1-1/+1
For the sake of consistency. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Rethink mark field in struct flowiThomas Graf6-52/+13
Now that all protocols have been made aware of the mark field it can be moved out of the union thus simplyfing its usage. The config options in the IPv4/IPv6/DECnet subsystems to enable respectively disable mark based routing only obfuscate the code with ifdefs, the cost for the additional comparison in the flow key is insignificant, and most distributions have all these options enabled by default anyway. Therefore it makes sense to remove the config options and enable mark based routing by default. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Turn nfmark into generic markThomas Graf9-15/+15
nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02SELinux: Return correct context for SO_PEERSECVenkat Yekkirala1-0/+2
Fix SO_PEERSEC for tcp sockets to return the security context of the peer (as represented by the SA from the peer) as opposed to the SA used by the local/source socket. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02[IPV4]: encapsulation annotationsAl Viro4-38/+38
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[XFRM]: misc annotationsAl Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: ipconfig and nfsroot annotationsAl Viro1-52/+53
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-28[NETFILTER]: ipt_REJECT: fix memory corruptionPatrick McHardy1-7/+9
On devices with hard_header_len > LL_MAX_HEADER ip_route_me_harder() reallocates the skb, leading to memory corruption when using the stale tcph pointer to update the checksum. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-28[NETFILTER]: conntrack: fix refcount leak when finding expectationYasuyuki Kozakai1-3/+3
All users of __{ip,nf}_conntrack_expect_find() don't expect that it increments the reference count of expectation. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-28[NETFILTER]: ctnetlink: fix reference count leakPatrick McHardy1-0/+1
When NFA_NEST exceeds the skb size the protocol reference is leaked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-25[NET]: Fix kfifo_alloc() error check.Akinobu Mita1-0/+2
The return value of kfifo_alloc() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-25[UDP]: Make udp_encap_rcv use pskb_may_pullOlaf Kirch1-5/+14
Make udp_encap_rcv use pskb_may_pull IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset, when header split is enabled. When receiving sufficiently large packets, the driver puts everything up to and including the UDP header into the header portion of the skb, and the rest goes into the paged part. udp_encap_rcv forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it passes it up it to the IKE daemon. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-25[NETFILTER]: H.323 conntrack: fix crash with CONFIG_IP_NF_CT_ACCTFaidon Liambotis1-2/+2
H.323 connection tracking code calls ip_ct_refresh_acct() when processing RCFs and URQs but passes NULL as the skb. When CONFIG_IP_NF_CT_ACCT is enabled, the connection tracking core tries to derefence the skb, which results in an obvious panic. A similar fix was applied on the SIP connection tracking code some time ago. Signed-off-by: Faidon Liambotis <paravoid@debian.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15[TCP]: Fix up sysctl_tcp_mem initialization.John Heffner1-3/+4
Fix up tcp_mem initial settings to take into account the size of the hash entries (different on SMP and non-SMP systems). Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queuePatrick McHardy1-3/+4
Based on patch by James D. Nurmi: I've got some code very dependant on nfnetlink_queue, and turned up a large number of warns coming from skb_trim. While it's quite possibly my code, having not seen it on older kernels made me a bit suspect. Anyhow, based on some googling I turned up this thread: http://lkml.org/lkml/2006/8/13/56 And believe the issue to be related, so attached is a small patch to the kernel -- not sure if this is completely correct, but for anyone else hitting the WARN_ON(1) in skbuff.h, it might be helpful.. Signed-off-by: James D. Nurmi <jdnurmi@gmail.com> Ported to ip6_queue and nfnetlink_queue and added return value checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-10[IPVS]: More endianness fixed.Julian Anastasov3-6/+6
- make sure port in FTP data is in network order (in fact it was looking buggy for big endian boxes before Viro's changes) - htonl -> htons for port Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-07[TCP]: Don't use highmem in tcp hash size calculation.John Heffner1-2/+2
This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-01[TCP]: Set default congestion control when no sysctl.Stephen Hemminger2-7/+8
The setting of the default congestion control was buried in the sysctl code so it would not be done properly if SYSCTL was not enabled. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30[NetLabel]: protect the CIPSOv4 socket option from setsockopt()Paul Moore2-5/+4
This patch makes two changes to protect applications from either removing or tampering with the CIPSOv4 IP option on a socket. The first is the requirement that applications have the CAP_NET_RAW capability to set an IPOPT_CIPSO option on a socket; this prevents untrusted applications from setting their own CIPSOv4 security attributes on the packets they send. The second change is to SELinux and it prevents applications from setting any IPv4 options when there is an IPOPT_CIPSO option already present on the socket; this prevents applications from removing CIPSOv4 security attributes from the packets they send. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30[NETFILTER]: ip_tables: compat code module refcounting fixDmitry Mishin1-25/+11
This patch fixes bug in iptables modules refcounting on compat error way. As we are getting modules in check_compat_entry_size_and_hooks(), in case of later error, we should put them all in translate_compat_table(), not in the compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Vasily Averin <vvs@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30[NETFILTER]: ip_tables: compat error way cleanupVasily Averin1-0/+1
This patch adds forgotten compat_flush_offset() call to error way of translate_compat_table(). May lead to table corruption on the next compat_do_replace(). Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tablesDmitry Mishin2-17/+38
There is a number of issues in parsing user-provided table in translate_table(). Malicious user with CAP_NET_ADMIN may crash system by passing special-crafted table to the *_tables. The first issue is that mark_source_chains() function is called before entry content checks. In case of standard target, mark_source_chains() function uses t->verdict field in order to determine new position. But the check, that this field leads no further, than the table end, is in check_entry(), which is called later, than mark_source_chains(). The second issue, that there is no check that target_offset points inside entry. If so, *_ITERATE_MATCH macro will follow further, than the entry ends. As a result, we'll have oops or memory disclosure. And the third issue, that there is no check that the target is completely inside entry. Results are the same, as in previous issue. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30[NET]: fix uaccess handlingHeiko Carstens1-6/+11
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-25[TCP] H-TCP: fix integer overflowGavin McCullagh1-1/+1
When using H-TCP with a single flow on a 500Mbit connection (or less actually), alpha can exceed 65000, so alpha needs to be a u32. Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie> Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-25[TCP] cubic: scaling errorStephen Hemminger1-3/+3
Doug Leith observed a discrepancy between the version of CUBIC described in the papers and the version in 2.6.18. A math error related to scaling causes Cubic to grow too slowly. Patch is from "Sangtae Ha" <sha2@ncsu.edu>. I validated that it does fix the problems. See the following to show behavior over 500ms 100 Mbit link. Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) ------- Receiver (2.6.19-rc3) 1G [netem] 100M http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.png Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-24[IPV4] ipconfig: fix RARP ic_servaddr breakageAl Viro1-1/+1
memcpy 4 bytes to address of auto unsigned long variable followed by comparison with u32 is a bloody bad idea. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-20[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err()Eric Dumazet1-1/+1
I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a little bit cheaper. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-20[NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layerBjörn Steinbrink1-0/+3
The 32bit compatibility layer has no CAP_NET_ADMIN check in compat_do_ipt_get_ctl, which for example allows to list the current iptables rules even without having that capability (the non-compat version requires it). Other capabilities might be required to exploit the bug (eg. CAP_NET_RAW to get the nfnetlink socket?), so a plain user can't exploit it, but a setup actually using the posix capability system might very well hit such a constellation of granted capabilities. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18[TCP]: Bound TSO defer timeJohn Heffner1-5/+15
This patch limits the amount of time you will defer sending a TSO segment to less than two clock ticks, or the time between two acks, whichever is longer. On slow links, deferring causes significant bursts. See attached plots, which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness causes significant jitter, tends to overflow queues early (bad for short queues), and makes delay-based congestion control more difficult. Deferring by a couple clock ticks I believe will have a relatively small impact on performance. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18[IPv4] fib: Remove unused fib_config membersThomas Graf1-5/+0
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15[NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()Eric Dumazet1-7/+22
1) shrink struct inet_peer on 64 bits platforms.
2006-10-15NetLabel: the CIPSOv4 passthrough mapping does not pass categories correctlyPaul Moore1-2/+2
The CIPSO passthrough mapping had a problem when sending categories which would cause no or incorrect categories to be sent on the wire with a packet. This patch fixes the problem which was a simple off-by-one bug. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2006-10-15NetLabel: only deref the CIPSOv4 standard map fields when using standard mappingPaul Moore1-6/+12
Fix several places in the CIPSO code where it was dereferencing fields which did not have valid pointers by moving those pointer dereferences into code blocks where the pointers are valid. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2006-10-15[NETFILTER]: ctnetlink: Remove debugging messagesPablo Neira Ayuso1-69/+3
Remove (compilation-breaking) debugging messages introduced at early development stage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15[NETFILTER]: ipt_ECN/ipt_TOS: fix incorrect checksum updatePatrick McHardy2-6/+6
Even though the tos field is only a single byte large, the values need to be converted to net-endian for the checkum update so they are in the corrent byte position. Also fix incorrect endian annotations. Reported by Stephane Chazelas <Stephane_Chazelas@yahoo.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15[NETFILTER]: arp_tables: missing unregistration on module unloadPatrick McHardy1-0/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-12[NET]: Do not memcmp() over pad bytes of struct flowi.David S. Miller1-3/+9
They are not necessarily initialized to zero by the compiler, for example when using run-time initializers of automatic on-stack variables. Noticed by Eric Dumazet and Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>