aboutsummaryrefslogtreecommitdiffstats
path: root/include/net/dst.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-12-14net: Abstract default MTU metric calculation behind an accessor.David S. Miller1-7/+8
Like RTAX_ADVMSS, make the default calculation go through a dst_ops method rather than caching the computation in the routing cache entries. Now dst metrics are pretty much left as-is when new entries are created, thus optimizing metric sharing becomes a real possibility. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-13net: Abstract default ADVMSS behind an accessor.David S. Miller1-1/+13
Make all RTAX_ADVMSS metric accesses go through a new helper function, dst_metric_advmss(). Leave the actual default metric as "zero" in the real metric slot, and compute the actual default value dynamically via a new dst_ops AF specific callback. For stacked IPSEC routes, we use the advmss of the path which preserves existing behavior. Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates advmss on pmtu updates. This inconsistency in advmss handling results in more raw metric accesses than I wish we ended up with. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12ipv4: Don't pre-seed hoplimit metric.David S. Miller1-6/+0
Always go through a new ip4_dst_hoplimit() helper, just like ipv6. This allowed several simplifications: 1) The interim dst_metric_hoplimit() can go as it's no longer userd. 2) The sysctl_ip_default_ttl entry no longer needs to use ipv4_doint_and_flush, since the sysctl is not cached in routing cache metrics any longer. 3) ipv4_doint_and_flush no longer needs to be exported and therefore can be marked static. When ipv4_doint_and_flush_strategy was removed some time ago, the external declaration in ip.h was mistakenly left around so kill that off too. We have to move the sysctl_ip_default_ttl declaration into ipv4's route cache definition header net/route.h, because currently net/ip.h (where the declaration lives now) has a back dependency on net/route.h Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12net: Abstract RTAX_HOPLIMIT metric accesses behind helper.David S. Miller1-1/+14
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-09net: Abstract away all dst_entry metrics accesses.David S. Miller1-3/+23
Use helper functions to hide all direct accesses, especially writes, to dst_entry metrics values. This will allow us to: 1) More easily change how the metrics are stored. 2) Implement COW for metrics. In particular this will help us put metrics into the inetpeer cache if that is what we end up doing. We can make the _metrics member a pointer instead of an array, initially have it point at the read-only metrics in the FIB, and then on the first set grab an inetpeer entry and point the _metrics member there. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-11-08decnet: RCU conversion and get rid of dev_base_lockEric Dumazet1-4/+4
While tracking dev_base_lock users, I found decnet used it in dnet_select_source(), but for a wrong purpose: Writers only hold RTNL, not dev_base_lock, so readers must use RCU if they cannot use RTNL. Adds an rcu_head in struct dn_ifaddr and handle proper RCU management. Adds __rcu annotation in dn_route as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27ipv4: add __rcu annotations to routes.cEric Dumazet1-1/+1
Add __rcu annotations to : (struct dst_entry)->rt_next (struct rt_hash_bucket)->chain And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03net: introduce DST_NOCACHE flagEric Dumazet1-4/+5
While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27tunnels: prepare percpu accountingEric Dumazet1-5/+19
Tunnels are going to use percpu for their accounting. They are going to use a new tstats field in net_device. skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx() IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26net: reset skb queue mapping when rx'ing over tunnelTom Herbert1-0/+1
Reset queue mapping when an skb is reentering the stack via a tunnel. On second pass, the queue mapping from the original device is no longer valid. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-04net: check for refcount if pop a stacked dst_entrySteffen Klassert1-3/+3
xfrm triggers a warning if dst_pop() drops a refcount on a noref dst. This patch changes dst_pop() to skb_dst_pop(). skb_dst_pop() drops the refcnt only on a refcounted dst. Also we don't clone the child dst_entry, so it is not refcounted and we can use skb_dst_set_noref() in xfrm_output_one(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Introduce skb_tunnel_rx() helperEric Dumazet1-0/+20
skb rxhash should be cleared when a skb is handled by a tunnel before being delivered again, so that correct packet steering can take place. There are other cleanups and accounting that we can factorize in a new helper, skb_tunnel_rx() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: add a noref bit on skb dstEric Dumazet1-3/+45
Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13net: sk_dst_cache RCUificationEric Dumazet1-15/+0
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this work. sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock) This rwlock is readlocked for a very small amount of time, and dst entries are already freed after RCU grace period. This calls for RCU again :) This patch converts sk_dst_lock to a spinlock, and use RCU for readers. __sk_dst_get() is supposed to be called with rcu_read_lock() or if socket locked by user, so use appropriate rcu_dereference_check() condition (rcu_read_lock_held() || sock_owned_by_user(sk)) This patch avoids two atomic ops per tx packet on UDP connected sockets, for example, and permits sk_dst_lock to be much less dirtied. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-23net: Add rtnetlink init_rcvwnd to set the TCP initial receive windowlaurent chavey1-2/+0
Add rtnetlink init_rcvwnd to set the TCP initial receive window size advertised by passive and active TCP connections. The current Linux TCP implementation limits the advertised TCP initial receive window to the one prescribed by slow start. For short lived TCP connections used for transaction type of traffic (i.e. http requests), bounding the advertised TCP initial receive window results in increased latency to complete the transaction. Support for setting initial congestion window is already supported using rtnetlink init_cwnd, but the feature is useless without the ability to set a larger TCP initial receive window. The rtnetlink init_rcvwnd allows increasing the TCP initial receive window, allowing TCP connection to advertise larger TCP receive window than the ones bounded by slow start. Signed-off-by: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-15tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.David S. Miller1-1/+1
It creates a regression, triggering badness for SYN_RECV sockets, for example: [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293 [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000 [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32) [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24002442 XER: 00000000 [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000 This is likely caused by the change in the 'estab' parameter passed to tcp_parse_options() when invoked by the functions in net/ipv4/tcp_minisocks.c But even if that is fixed, the ->conn_request() changes made in this patch series is fundamentally wrong. They try to use the listening socket's 'dst' to probe the route settings. The listening socket doesn't even have a route, and you can't get the right route (the child request one) until much later after we setup all of the state, and it must be done by hand. This stuff really isn't ready, so the best thing to do is a full revert. This reverts the following commits: f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d 022c3f7d82f0f1c68018696f2f027b87b9bb45c2 1aba721eba1d84a2defce45b950272cee1e6c72a cda42ebd67ee5fdf09d7057b5a4584d36fe8a335 345cda2fd695534be5a4494f1b59da9daed33663 dc343475ed062e13fc260acccaab91d7d80fd5b2 05eaade2782fb0c90d3034fd7a7d5a16266182bb 6a2a2d6bf8581216e08be15fcb563cfd6c430e1e Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04tcp: Use defaults when no route options are availableGilad Ben-Yossef1-1/+1
Trying to parse the option of a SYN packet that we have no route entry for should just use global wide defaults for route entry options. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Tested-by: Valdis.Kletnieks@vt.edu Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04net: cleanup include/netEric Dumazet1-2/+1
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Add dst_feature to query route entry featuresGilad Ben-Yossef1-1/+7
Adding an accessor to existing dst_entry feautres field and refactor the only supported feature (allfrag) to use it. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20net: Fix for dst_negative_adviceKrishna Kumar1-2/+10
dst_negative_advice() should check for changed dst and reset sk_tx_queue_mapping accordingly. Pass sock to the callers of dst_negative_advice. (sk_reset_txq is defined just for use by dst_negative_advice. The only way I could find to get around this is to move dst_negative_() from dst.h to dst.c, include sock.h in dst.c, etc) Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01netns: embed ip6_dst_ops directlyAlexey Dobriyan1-22/+1
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated, but there is no fundamental reason for it. Embed it directly into struct netns_ipv6. For that: * move struct dst_ops into separate header to fix circular dependencies I honestly tried not to, it's pretty impossible to do other way * drop dynamical allocation, allocate together with netns For a change, remove struct dst_ops::dst_net, it's deducible by using container_of() given dst_ops pointer. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: skb->dst accessorsEric Dumazet1-3/+9
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25netns xfrm: lookup in netnsAlexey Dobriyan1-8/+8
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns to flow_cache_lookup() and resolver callback. Take it from socket or netdevice. Stub DECnet to init_net. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16net: make sure struct dst_entry refcount is aligned on 64 bytesEric Dumazet1-0/+19
As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17 [NET]: Fix tbench regression in 2.6.25-rc1), it is really important that struct dst_entry refcount is aligned on a cache line. We cannot use __atribute((aligned)), so manually pad the structure for 32 and 64 bit arches. for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80 for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0 As it is not possible to guess at compile time cache line size, we use a generic value of 64 bytes, that satisfies many current arches. (Using 128 bytes alignment on 64bit arches would waste 64 bytes) Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont break this alignment. "tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz (2350 MB/s instead of 2250 MB/s) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-11net: remove struct dst_entry::entry_sizeAlexey Dobriyan1-1/+0
Unused after kmem_cache_zalloc() conversion. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: reduce structures when XFRM=nAlexey Dobriyan1-1/+2
ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04net: Kill plain NET_XMIT_BYPASS.David S. Miller1-11/+1
dst_input() was doing something completely absurd, looping on skb->dst->input() if NET_XMIT_BYPASS was seen, but these functions never return such an error. And as a result plain ole' NET_XMIT_BYPASS has no more references and can be completely killed off. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18tcp: RTT metrics scalingStephen Hemminger1-0/+12
Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in kernel units (jiffies) and this leaks out through the netlink API to user space where the units for jiffies are unknown. This patches changes the kernel to convert to/from milliseconds. This changes the ABI, but milliseconds seemed like the most natural unit for these parameters. Values available via syscall in /proc/net/rt_cache and netlink will be in milliseconds. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27[NET]: uninline dst_releaseIlpo Järvinen1-9/+1
Codiff stats (allyesconfig, v2.6.24-mm1): -16420 187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release Without number of debug related CONFIGs (v2.6.25-rc2-mm1): -7257 186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release dst_release | +40 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-12[NET]: Fix tbench regression in 2.6.25-rc1Zhang Yanmin1-9/+14
Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect located below patch. b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue Nov 13 21:33:32 2007 -0800 [IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. Above patch changes the cache line alignment, especially member __refcnt. I did a testing by adding 2 unsigned long pading before lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next cache line. The performance is recovered. I created a patch to rearrange the members in struct dst_entry. With Eric and Valdis Kletnieks's suggestion, I made finer arrangement. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. On 16-core tigerton: If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch; If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce regression. Thank Eric, Valdis, and David! Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64Eric Dumazet1-5/+5
On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to 0x180 bytes by SLAB allocator. We can reduce this to exactly 0x140 bytes, without alignment overhead, and store 12 struct rtable per PAGE instead of 10. rate_tokens is currently defined as an "unsigned long", while its content should not exceed 6*HZ. It can safely be converted to an unsigned int. Moving tclassid right after rate_tokens to fill the 4 bytes hole permits to save 8 bytes on 'struct dst_entry', which finally permits to save 8 bytes on 'struct rtable' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][DST]: Add the network namespace pointer in dst_opsDaniel Lezcano1-0/+1
The network namespace pointer can be stored into the dst_ops structure. This is usefull when there are multiple instances of the dst_ops for a protocol. When there are no several instances, this field will be never used in the protocol. So there is no impact for the protocols which do implement the network namespaces. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][DST] dst: pass the dst_ops as parameter to the gc functionsDaniel Lezcano1-1/+1
The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Remove unused member of dst_entryRami Rosen1-1/+0
The info placeholder member of dst_entry seems to be unused in the network stack. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Add ICMP host relookup supportHerbert Xu1-0/+1
RFC 4301 requires us to relookup ICMP traffic that does not match any policies using the reverse of its payload. This patch implements this for ICMP traffic that originates from or terminates on localhost. This is activated on outbound with the new policy flag XFRM_POLICY_ICMP, and on inbound by the new state flag XFRM_STATE_ICMP. On inbound the policy check is now performed by the ICMP protocol so that it can repeat the policy check where necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Make xfrm_lookup flags argument a bit-fieldHerbert Xu1-0/+5
This patch introduces an enum for bits in the flags argument of xfrm_lookup. This is so that we can cram more information into it later. Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has been added with the value 1 << 0 to represent the current meaning of flags. The test in __xfrm_lookup has been changed accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Merge most of the output pathHerbert Xu1-0/+1
As part of the work on asynchrnous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common output code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Eliminate duplicate copies of dst_discardHerbert Xu1-0/+1
We have a number of copies of dst_discard scattered around the place which all do the same thing, namely free a packet on the input or output paths. This patch deletes all of them except dst_discard and points all the users to it. The only non-trivial bit is decnet where it returns an error. However, conceptually this is identical to the blackhole functions used in IPv4 and IPv6 which do not return errors. So they should either all return errors or all return zero. For now I've stuck with the majority and picked zero as the return value. It doesn't really matter in practice since few if any driver would react differently depending on a zero return value or NET_RX_DROP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Move nfheader_len into rt6_infoHerbert Xu1-1/+0
The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. It also reorders the fields in rt6_info to minimize holes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10[NET]: Make helper to get dst entry and "use" itPavel Emelyanov1-0/+7
There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[IPV4]: The scheduled removal of multipath cached routing support.David S. Miller1-1/+0
With help from Chris Wedgwood. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[XFRM]: Allow packet drops during larval state resolution.David S. Miller1-0/+7
The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET]: Reorder fields of struct dst_entryEric Dumazet1-10/+10
This last patch (but not least :) ) finally moves the next pointer at the end of struct dst_entry. This permits to perform route cache lookups with a minimal cost of one cache line per entry, instead of two. Both 32bits and 64bits platforms benefit from this new layout. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET]: Introduce union in struct dst_entry to hold 'next' pointerEric Dumazet1-1/+6
This patch introduces an anonymous union to nicely express the fact that all objects inherited from struct dst_entry should access to the generic 'next' pointer but with appropriate type verification. This patch is a prereq before following patches. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter1-1/+1
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-28[NET]: Annotate dst_ops protocolAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[XFRM] STATE: Support non-fragment outbound transformation headers.Masahide NAKAMURA1-0/+1
For originated outbound IPv6 packets which will fragment, ip6_append_data() should know length of extension headers before sending them and the length is carried by dst_entry. IPv6 IPsec headers fragment then transformation was designed to place all headers after fragment header. OTOH Mobile IPv6 extension headers do not fragment then it is a good idea to make dst_entry have non-fragment length to tell it to ip6_append_data(). Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-26Don't include linux/config.h from anywhere else in include/David Woodhouse1-1/+0
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-01-07[XFRM]: Netfilter IPsec output hooksPatrick McHardy1-10/+1
Call netfilter hooks before IPsec transforms. Packets visit the FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode transform. Patch from Herbert Xu <herbert@gondor.apana.org.au>: Move the loop from dst_output into xfrm4_output/xfrm6_output since they're the only ones who need to it. xfrm{4,6}_output_one() processes the first SA all subsequent transport mode SAs and is called in a loop that calls the netfilter hooks between each two calls. In order to avoid the tail call issue, I've added the inline function nf_hook which is nf_hook_slow plus the empty list check. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.hArnaldo Carvalho de Melo1-0/+1
To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>