aboutsummaryrefslogtreecommitdiffstats
path: root/net/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2008-11-16net: Convert TCP & DCCP hash tables to use RCU / hlist_nullsEric Dumazet1-1/+3
RCU was added to UDP lookups, using a fast infrastructure : - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the price of call_rcu() at freeing time. - hlist_nulls permits to use few memory barriers. This patch uses same infrastructure for TCP/DCCP established and timewait sockets. Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications using short lived TCP connections. A followup patch, converting rwlocks to spinlocks will even speedup this case. __inet_lookup_established() is pretty fast now we dont have to dirty a contended cache line (read_lock/read_unlock) Only established and timewait hashtable are converted to RCU (bind table and listen table are still using traditional locking) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-14scm: fix scm_fp_list->list initialization made in wrong placePavel Emelyanov1-2/+0
This is the next page of the scm recursion story (the commit f8d570a4 net: Fix recursive descent in __scm_destroy()). In function scm_fp_dup(), the INIT_LIST_HEAD(&fpl->list) of newly created fpl is done *before* the subsequent memcpy from the old structure and thus the freshly initialized list is overwritten. But that's OK, since this initialization is not required at all, since the fpl->list is list_add-ed at the destruction time in any case (and is unused in other code), so I propose to drop both initializations, rather than moving it after the memcpy. Please, correct me if I miss something significant. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-14net: speedup dst_release()Eric Dumazet1-2/+4
During tbench/oprofile sessions, I found that dst_release() was in third position. CPU: Core 2, speed 2999.68 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 483726 9.0185 __copy_user_zeroing_intel 191466 3.5697 __copy_user_intel 185475 3.4580 dst_release 175114 3.2648 ip_queue_xmit 153447 2.8608 tcp_sendmsg 108775 2.0280 tcp_recvmsg 102659 1.9140 sysenter_past_esp 101450 1.8914 tcp_current_mss 95067 1.7724 __copy_from_user_ll 86531 1.6133 tcp_transmit_skb Of course, all CPUS fight on the dst_entry associated with 127.0.0.1 Instead of first checking the refcount value, then decrement it, we use atomic_dec_return() to help CPU to make the right memory transaction (ie getting the cache line in exclusive mode) dst_release() is now at the fifth position, and tbench a litle bit faster ;) CPU: Core 2, speed 3000.1 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 647107 8.8072 __copy_user_zeroing_intel 258840 3.5229 ip_queue_xmit 258302 3.5155 __copy_user_intel 209629 2.8531 tcp_sendmsg 165632 2.2543 dst_release 149232 2.0311 tcp_current_mss 147821 2.0119 tcp_recvmsg 137893 1.8767 sysenter_past_esp 127473 1.7349 __copy_from_user_ll 121308 1.6510 ip_finish_output 118510 1.6129 tcp_transmit_skb 109295 1.4875 tcp_v4_rcv Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.cIngo Molnar1-2/+0
fix this warning: net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used this is a lockdep macro problem in the !LOCKDEP case. We cannot convert it to an inline because the macro works on multiple types, but we can mark the parameter used. [ also clean up a misaligned tab in sock_lock_init_class_and_name() ] [ also remove #ifdefs from around af_family_clock_key strings - which were certainly added to get rid of the ugly build warnings. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-14Merge branch 'master' into nextJames Morris3-4/+25
Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Wrap current->cred and a few other accessorsDavid Howells1-1/+1
Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Separate task security context from task_structDavid Howells1-4/+6
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Wrap task credential accesses in the networking subsystemDavid Howells2-6/+10
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: netdev@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2008-11-12net: Cleanup of neighbour codeEric Dumazet1-9/+3
Using read_pnet() and write_pnet() in neighbour code ease the reading of code. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-11net: remove struct neigh_table::pdeAlexey Dobriyan1-3/+2
->pde isn't actually needed, since name is stashed in ->id. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-11Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-1/+1
Conflicts: drivers/message/fusion/mptlan.c drivers/net/sfc/ethtool.c net/mac80211/debugfs_sta.c
2008-11-10net: fix setting of skb->tail in skb_recycle_check()Lennert Buytenhek1-1/+1
Since skb_reset_tail_pointer() reads skb->data, we need to set skb->data before calling skb_reset_tail_pointer(). This was causing spurious skb_over_panic()s from skb_put() being called on a recycled skb that had its skb->tail set to beyond where it should have been. Bug report from Peter van Valderen <linux@ddcrew.com>. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-10pktgen: add full reset functionalityJesse Brandeburg1-0/+22
While testing pktgen, I found that sometimes my configurations from previous runs would be left over, particularly when going from a test with 8 threads down to a test with 4 threads. This adds new functionality to pktgen where you can call pgset "reset" and it will be just like you just insmod'ed pktgen again. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-10net: struct device - replace bus_id with dev_name(), dev_set_name()Kay Sievers1-1/+1
Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-07net: Guaranetee the proper ordering of the loopback device. v2Eric W. Biederman1-5/+17
I was recently hunting a bug that occurred in network namespace cleanup. In looking at the code it became apparrent that we have and will continue to have cases where if we have anything going on in a network namespace there will be assumptions that the loopback device is present. Things like sending igmp unsubscribe messages when we bring down network devices invokes the routing code which assumes that at least the loopback driver is present. Therefore to avoid magic initcall ordering hackery that is hard to follow and hard to get right insert a call to register the loopback device directly from net_dev_init(). This guarantes that the loopback device is the first device registered and the last network device to go away. But do it carefully so we register the loopback device after we clear dev_boot_phase. Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-07net: fib_rules ordering fixes.Eric W. Biederman1-3/+4
We need to setup the network namespace state before we register the notifier. Otherwise if a network device is already registered we get a nasty NULL pointer dereference. Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-07Revert "net: Guaranetee the proper ordering of the loopback device."David S. Miller1-12/+0
This reverts commit ae33bc40c0d96d02f51a996482ea7e41c5152695.
2008-11-06net: mark flow_cache_cpu_prepare() as __initAlexey Dobriyan1-1/+1
It's called from __init code only. And__devinit in generic networking code is pretty strange :^) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-06Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller3-3/+36
Conflicts: drivers/net/wireless/ath5k/base.c net/8021q/vlan_core.c
2008-11-06net: Fix recursive descent in __scm_destroy().David S. Miller1-3/+21
__scm_destroy() walks the list of file descriptors in the scm_fp_list pointed to by the scm_cookie argument. Those, in turn, can close sockets and invoke __scm_destroy() again. There is nothing which limits how deeply this can occur. The idea for how to fix this is from Linus. Basically, we do all of the fput()s at the top level by collecting all of the scm_fp_list objects hit by an fput(). Inside of the initial __scm_destroy() we keep running the list until it is empty. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05net: Don't leak packets when a netns is going downEric W. Biederman1-1/+3
I have been tracking for a while a case where when the network namespace exits the cleanup gets stck in an endless precessess of: unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 It turns out that if you listen on a multicast address an unsubscribe packet is sent when the network device goes down. If you shutdown the network namespace without carefully cleaning up this can trigger the unsubscribe packet to be sent over the loopback interface while the network namespace is going down. All of which is fine except when we drop the packet and forget to free it leaking the skb and the dst entry attached to. As it turns out the dst entry hold a reference to the idev which holds the dev and keeps everything from being cleaned up. Yuck! By fixing my earlier thinko and add the needed kfree_skb and everything cleans up beautifully. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05net: Guaranetee the proper ordering of the loopback device.Eric W. Biederman1-0/+12
I was recently hunting a bug that occurred in network namespace cleanup. In looking at the code it became apparrent that we have and will continue to have cases where if we have anything going on in a network namespace there will be assumptions that the loopback device is present. Things like sending igmp unsubscribe messages when we bring down network devices invokes the routing code which assumes that at least the loopback driver is present. Therefore to avoid magic initcall ordering hackery that is hard to follow and hard to get right insert a call to register the loopback device directly from net_dev_init(). This guarantes that the loopback device is the first device registered and the last network device to go away. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05netns: Delete virtual interfaces during namespace cleanupEric W. Biederman1-0/+6
When physical devices are inside of network namespace and that network namespace terminates we can not make them go away. We have to keep them and moving them to the initial network namespace is the best we can do. For virtual devices left in a network namespace that is exiting we have no need to preserve them and we now have the infrastructure that allows us to delete them. So delete virtual devices when we exit a network namespace. Keeping the necessary user space clean up after a network namespace exits much more tractable. Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05net: sk_free_datagram() should use sk_mem_reclaim_partial()Eric Dumazet1-3/+2
I noticed a contention on udp_memory_allocated on regular UDP applications. While tcp_memory_allocated is seldom used, it appears each incoming UDP frame is currently touching udp_memory_allocated when queued, and when received by application. One possible solution is to use sk_mem_reclaim_partial() instead of sk_mem_reclaim(), so that we keep a small reserve (less than one page) of memory for each UDP socket. We did something very similar on TCP side in commit 9993e7d313e80bdc005d09c7def91903e0068f07 ([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer()) A more complex solution would need to convert prot->memory_allocated to use a percpu_counter with batches of 64 or 128 pages. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04net: fix packet socket delivery in rx irq handlerPatrick McHardy1-0/+3
The changes to deliver hardware accelerated VLAN packets to packet sockets (commit bc1d0411) caused a warning for non-NAPI drivers. The __vlan_hwaccel_rx() function is called directly from the drivers RX function, for non-NAPI drivers that means its still in RX IRQ context: [ 27.779463] ------------[ cut here ]------------ [ 27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81() ... [ 27.782520] [<c0264755>] netif_nit_deliver+0x5b/0x75 [ 27.782590] [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162 [ 27.782664] [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1] [ 27.782738] [<c0155b17>] handle_IRQ_event+0x23/0x51 [ 27.782808] [<c015692e>] handle_edge_irq+0xc2/0x102 [ 27.782878] [<c0105fd5>] do_IRQ+0x4d/0x64 Split hardware accelerated VLAN reception into two parts to fix this: - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN device lookup, then calls netif_receive_skb()/netif_rx() - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb() in softirq context, performs the real reception and delivery to packet sockets. Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03net: '&' reduxAlexey Dobriyan2-41/+41
I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03net: increase receive packet quantumStephen Hemminger1-7/+4
This patch gets about 1.25% back on tbench regression. My change to NAPI for multiqueue support changed the time limit on network receive processing. Under sustained loads like tbench, this can cause the receiver to reschedule prematurely. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01net: add documentation for skb recyclingStephen Hemminger1-0/+12
Commit 04a4bb55bcf35b63d40fd2725e58599ff8310dd7 ("net: add skb_recycle_check() to enable netdriver skb recycling") added a method for network drivers to recycle skbuffs, but while use of this mechanism was documented in the commit message, it should really have been added as a docbook comment as well -- this patch does that. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31mac80211: Re-enable aggregationSujith1-0/+1
Wireless HW without any dedicated queues for aggregation do not need the ampdu_queues mechanism present right now in mac80211. Since mac80211 is still incomplete wrt TX MQ changes, do not allow aggregation sessions for drivers that set ampdu_queues. This is only an interim hack until Intel fixes the requeue issue. Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller2-14/+45
Conflicts: drivers/net/wireless/p54/p54common.c
2008-10-30netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsysAlexey Dobriyan1-0/+32
netns ops which are registered with register_pernet_gen_device() are shutdown strictly before those which are registered with register_pernet_subsys(). Sometimes this leads to opposite (read: buggy) shutdown ordering between two modules. Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules which aren't elite enough for entry in struct net, and which can't use register_pernet_gen_device(). PPTP conntracking module is such one. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: RCU handling for Unicast packets.Eric Dumazet1-1/+2
Goals are : 1) Optimizing handling of incoming Unicast UDP frames, so that no memory writes should happen in the fast path. Note: Multicasts and broadcasts still will need to take a lock, because doing a full lockless lookup in this case is difficult. 2) No expensive operations in the socket bind/unhash phases : - No expensive synchronize_rcu() calls. - No added rcu_head in socket structure, increasing memory needs, but more important, forcing us to use call_rcu() calls, that have the bad property of making sockets structure cold. (rcu grace period between socket freeing and its potential reuse make this socket being cold in CPU cache). David did a previous patch using call_rcu() and noticed a 20% impact on TCP connection rates. Quoting Cristopher Lameter : "Right. That results in cacheline cooldown. You'd want to recycle the object as they are cache hot on a per cpu basis. That is screwed up by the delayed regular rcu processing. We have seen multiple regressions due to cacheline cooldown. The only choice in cacheline hot sensitive areas is to deal with the complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU." - Because udp sockets are allocated from dedicated kmem_cache, use of SLAB_DESTROY_BY_RCU can help here. Theory of operation : --------------------- As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()), special attention must be taken by readers and writers. Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed, reused, inserted in a different chain or in worst case in the same chain while readers could do lookups in the same time. In order to avoid loops, a reader must check each socket found in a chain really belongs to the chain the reader was traversing. If it finds a mismatch, lookup must start again at the begining. This *restart* loop is the reason we had to use rdlock for the multicast case, because we dont want to send same message several times to the same socket. We use RCU only for fast path. Thus, /proc/net/udp still takes spinlocks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: don't use INIT_RCU_HEADAlexey Dobriyan2-4/+0
call_rcu() will unconditionally rewrite RCU head anyway. Applies to struct neigh_parms struct neigh_table struct net struct cipso_v4_doi struct in_ifaddr struct in_device rt->u.dst Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: reduce structures when XFRM=nAlexey Dobriyan1-1/+1
ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28pktgen: fix multiple queue warningJesse Brandeburg1-14/+13
when testing the new pktgen module with multiple queues and ixgbe with: pgset "flag QUEUE_MAP_CPU" I found that I was getting errors in dmesg like: pktgen: WARNING: QUEUE_MAP_CPU disabled because CPU count (8) exceeds number <4>pktgen: WARNING: of tx queues (8) on eth15 you'll note, 8 really doesn't exceed 8. This patch seemed to fix the logic errors and also the attempts at limiting line length in printk (which didn't work anyway) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27netns: Coexist with the sysfs limitations v2Eric W. Biederman2-5/+27
To make testing of the network namespace simpler allow the network namespace code and the sysfs code to be compiled and run at the same time. To do this only virtual devices are allowed in the additional network namespaces and those virtual devices are not placed in the kobject tree. Since virtual devices don't actually do anything interesting hardware wise that needs device management there should be no loss in keeping them out of the kobject tree and by implication sysfs. The gain in ease of testing and code coverage should be significant. Changelog: v2: As pointed out by Benjamin Thery it only makes sense to call device_rename in the initial network namespace for now. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Tested-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27net: convert print_mac to %pMJohannes Berg2-8/+6
This converts pretty much everything to print_mac. There were a few things that had conflicts which I have just dropped for now, no harm done. I've built an allyesconfig with this and looked at the files that weren't built very carefully, but it's a huge patch. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-23net: Fix disjunct computation of netdev featuresHerbert Xu1-64/+71
My change commit e2a6b85247aacc52d6ba0d9b37a99b8d1a3e0d83 net: Enable TSO if supported by at least one device didn't do what was intended because the netdev_compute_features function was designed for conjunctions. So what happened was that it would simply take the TSO status of the last constituent device. This patch extends it to support both conjunctions and disjunctions under the new name of netdev_increment_features. It also adds a new function netdev_fix_features which does the sanity checking that usually occurs upon registration. This ensures that the computation doesn't result in an illegal combination since this checking is absent when the change is initiated via ethtool. The two users of netdev_compute_features have been converted. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-19netdev: change name dropping error codesStephen Hemminger1-3/+3
If changename notifier returns an error code, it gets incorrectly cleared during rollback so the error is never returned to the user. Found while testing similar code for MTU changes. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-16net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)Johannes Berg2-4/+2
Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-14netns: fix net_generic array leakAlexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13net: Rationalise email address: Network Specific PartsAlan Cox4-4/+4
Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13pktgen: fix skb leak in case of failureIlpo Järvinen1-3/+5
Seems that skb goes into void unless something magic happened in pskb_expand_head in case of failure. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-12net: Fix off-by-one in skb_dma_mapDimitris Michailidis1-1/+1
The unwind loop iterates down to -1 instead of stopping at 0 and ends up accessing ->frags[-1]. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller2-28/+17
Conflicts: drivers/net/e1000e/ich8lan.c drivers/net/e1000e/netdev.c
2008-10-08netns: export netns listAlexey Dobriyan1-0/+1
Conntrack code will use it for a) removing expectations and helpers when corresponding module is removed, and b) removing conntracks when L3 protocol conntrack module is removed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-07net: Fix netdev_run_todo dead-lockHerbert Xu2-22/+7
Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: only invoke dev->change_rx_flags when device is UPPatrick McHardy1-6/+10
Jesper Dangaard Brouer <hawk@comx.dk> reported a bug when setting a VLAN device down that is in promiscous mode: When the VLAN device is set down, the promiscous count on the real device is decremented by one by vlan_dev_stop(). When removing the promiscous flag from the VLAN device afterwards, the promiscous count on the real device is decremented a second time by the vlan_change_rx_flags() callback. The root cause for this is that the ->change_rx_flags() callback is invoked while the device is down. The synchronization is meant to mirror the behaviour of the ->set_rx_mode callbacks, meaning the ->open function is responsible for doing a full sync on open, the ->close() function is responsible for doing full cleanup on ->stop() and ->change_rx_flags() is meant to do incremental changes while the device is UP. Only invoke ->change_rx_flags() while the device is UP to provide the intended behaviour. Tested-by: Jesper Dangaard Brouer <jdb@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: packet split receive apiPeter Zijlstra1-0/+20
Add some packet-split receive hooks. For one this allows to do NUMA node affine page allocs. Later on these hooks will be extended to do emergency reserve allocations for fragments. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07net: wrap sk->sk_backlog_rcv()Peter Zijlstra1-2/+2
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the generic sk_backlog_rcv behaviour. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>