aboutsummaryrefslogtreecommitdiffstats
path: root/net/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-04-03net: move address list functions to a separate fileJiri Pirko3-428/+483
+little renaming of unicast functions to be smooth with multicast ones Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-02net: illegal_highdma() fixEric Dumazet1-1/+3
Followup to commit 5acbbd428db47b12f137a8a2aa96b3c0a96b744e (net: change illegal_highdma to use dma_mask) If dev->dev.parent is NULL, we should not try to dereference it. Dont force inline illegal_highdma() as its pretty big now. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01net: change illegal_highdma to use dma_maskFUJITA Tomonori1-6/+14
Robert Hancock pointed out two problems about NETIF_F_HIGHDMA: -Many drivers only set the flag when they detect they can use 64-bit DMA, since otherwise they could receive DMA addresses that they can't handle (which on platforms without IOMMU/SWIOTLB support is fatal). This means that if 64-bit support isn't available, even buffers located below 4GB will get copied unnecessarily. -Some drivers set the flag even though they can't actually handle 64-bit DMA, which would mean that on platforms without IOMMU/SWIOTLB they would get a DMA mapping error if the memory they received happened to be located above 4GB. http://lkml.org/lkml/2010/3/3/530 We can use the dma_mask if we need bouncing or not here. Then we can safely fix drivers that misuse NETIF_F_HIGHDMA. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01flow: structurize flow cacheTimo Teräs1-104/+119
Group all per-cpu data to one structure instead of having many globals. Also prepare the internals so that we can have multiple instances of the flow cache if needed. Only the kmem_cache is left as a global as all flow caches share the same element size, and benefit from using a common cache. Signed-off-by: Timo Teras <timo.teras@iki.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01rps: keep the old behavior on SMP without rpsChangli Gao1-14/+28
keep the old behavior on SMP without rps RPS introduces a lock operation to per cpu variable input_pkt_queue on SMP whenever rps is enabled or not. On SMP without RPS, this lock isn't needed at all. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/core/dev.c | 42 ++++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30fix net/core/dst.c coding style error and warningslaurent chavey1-21/+20
Fix coding style errors and warnings output while running checkpatch.pl on the file net/core/dst.c. Signed-off-by: chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30netdev: ethtool RXHASH flagstephen hemminger1-1/+6
This adds ethtool and device feature flag to allow control of receive hashing offload. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-29rps: fix net-sysfs build for !CONFIG_RPSStephen Rothwell1-3/+4
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-28net: __netif_receive_skb should be staticEric Dumazet1-1/+1
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27net: increase preallocated size of nlmsg to accomodate for IFLA_STATS64Jan Engelhardt1-0/+1
When more data is stuffed into an nlmsg than initially projected, an extra allocation needs to be done. Reserve enough for IFLA_STATS64 so that this does not to needlessy happen. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27net: fix unaligned access in IFLA_STATS64Jan Engelhardt1-31/+31
Tony Luck observes that the original IFLA_STATS64 submission causes unaligned accesses. This is because nla_data() returns a pointer to a memory region that is only aligned to 32 bits. Do some memcpying to workaround this. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-25rps: add CONFIG_RPSEric Dumazet1-10/+19
RPS currently depends on SMP and SYSFS Adding a CONFIG_RPS makes sense in case this requirement changes in the future. This patch saves about 1500 bytes of kernel text in case SMP is on but SYSFS is off. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-23net: Fix locking in flush_backlogTom Herbert1-4/+8
Need to take spinlocks when dequeuing from input_pkt_queue in flush_backlog. Also, flush_backlog can now be called directly from netdev_run_todo. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22rps: Fix build with CONFIG_SYSFS enabledTom Herbert1-0/+4
Fix build with CONFIG_SYSFS not enabled. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21pktgen node allocationRobert Olsson1-5/+53
Here is patch to manipulate packet node allocation and implicitly how packets are DMA'd etc. The flag NODE_ALLOC enables the function and numa_node_id(); when enabled it can also be explicitly controlled via a new node parameter Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21net: dev_getfirstbyhwtype() optimizationEric Dumazet1-7/+10
Use RCU to avoid RTNL use in dev_getfirstbyhwtype() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21net: speedup netdev_set_master()Eric Dumazet1-4/+3
We currently force a synchronize_net() in netdev_set_master() This seems necessary only when a slave had a master and we dismantle it. In the other case ("ifenslave bond0 ethO"), we dont need this long delay. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21bonding: flush unicast and multicast lists when changing typeJiri Pirko1-2/+4
After the type change, addresses in unicast and multicast lists wouldn't make sense, not to mention possible different lenghts. So flush both lists here. Note "dev_addr_discard" will be very soon replaced by "dev_mc_flush" (once mc_list conversion will be done). Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21net: rtnetlink: ignore NETDEV_PRE_TYPE_CHANGE in rtnetlink_event()Patrick McHardy1-0/+1
Ignore the new NETDEV_PRE_TYPE_CHANGE event in rtnetlink_event() since there have been no changes userspace needs to be notified of. Also add a comment to the netdev notifier event definitions to remind people to update the exclusion list when adding new event types. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-3/+5
2010-03-18net: Potential null skb->dev dereferenceEric Dumazet1-3/+5
When doing "ifenslave -d bond0 eth0", there is chance to get NULL dereference in netif_receive_skb(), because dev->master suddenly becomes NULL after we tested it. We should use ACCESS_ONCE() to avoid this (or rcu_dereference()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18bonding: check return value of nofitier when changing typeJiri Pirko1-2/+2
This patch adds the possibility to refuse the bonding type change for other subsystems (such as for example bridge, vlan, etc.) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18rps: Fixed build with CONFIG_SMP not enabled.Tom Herbert1-0/+24
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16net: core: add IFLA_STATS64 supportJan Engelhardt1-1/+41
`ip -s link` shows interface counters truncated to 32 bit. This is because interface statistics are transported only in 32-bit quantity to userspace. This commit adds a new IFLA_STATS64 attribute that exports them in full 64 bit. References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16net: remove rcu locking from fib_rules_event()Eric Dumazet1-8/+2
We hold RTNL at this point and dont use RCU variants of list traversals, we dont need rcu_read_lock()/rcu_read_unlock() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16rps: Receive Packet SteeringTom Herbert3-57/+505
This patch implements software receive side packet steering (RPS). RPS distributes the load of received packet processing across multiple CPUs. Problem statement: Protocol processing done in the NAPI context for received packets is serialized per device queue and becomes a bottleneck under high packet load. This substantially limits pps that can be achieved on a single queue NIC and provides no scaling with multiple cores. This solution queues packets early on in the receive path on the backlog queues of other CPUs. This allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel. For each device (or each receive queue in a multi-queue device) a mask of CPUs is set to indicate the CPUs that can process packets. A CPU is selected on a per packet basis by hashing contents of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index into the CPU mask. The IPI mechanism is used to raise networking receive softirqs between CPUs. This effectively emulates in software what a multi-queue NIC can provide, but is generic requiring no device support. Many devices now provide a hash over the 4-tuple on a per packet basis (e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash in an skb field, and that value in turn is used to index into the RPS maps. Using the HW generated hash can avoid cache misses on the packet when steering it to a remote CPU. The CPU mask is set on a per device and per queue basis in the sysfs variable /sys/class/net/<device>/queues/rx-<n>/rps_cpus. This is a set of canonical bit maps for receive queues in the device (numbered by <n>). If a device does not support multi-queue, a single variable is used for the device (rx-0). Generally, we have found this technique increases pps capabilities of a single queue device with good CPU utilization. Optimal settings for the CPU mask seem to depend on architectures and cache hierarcy. Below are some results running 500 instances of netperf TCP_RR test with 1 byte req. and resp. Results show cumulative transaction rate and system CPU utilization. e1000e on 8 core Intel Without RPS: 108K tps at 33% CPU With RPS: 311K tps at 64% CPU forcedeth on 16 core AMD Without RPS: 156K tps at 15% CPU With RPS: 404K tps at 49% CPU bnx2x on 16 core AMD Without RPS 567K tps at 61% CPU (4 HW RX queues) Without RPS 738K tps at 96% CPU (8 HW RX queues) With RPS: 854K tps at 76% CPU (4 HW RX queues) Caveats: - The benefits of this patch are dependent on architecture and cache hierarchy. Tuning the masks to get best performance is probably necessary. - This patch adds overhead in the path for processing a single packet. In a lightly loaded server this overhead may eliminate the advantages of increased parallelism, and possibly cause some relative performance degradation. We have found that masks that are cache aware (share same caches with the interrupting CPU) mitigate much of this. - The RPS masks can be changed dynamically, however whenever the mask is changed this introduces the possibility of generating out of order packets. It's probably best not change the masks too frequently. Signed-off-by: Tom Herbert <therbert@google.com> include/linux/netdevice.h | 32 ++++- include/linux/skbuff.h | 3 + net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++-------- net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++- net/core/skbuff.c | 2 + 5 files changed, 538 insertions(+), 59 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16NET: netpoll, fix potential NULL ptr dereferenceJiri Slaby1-2/+2
Stanse found that one error path in netpoll_setup dereferences npinfo even though it is NULL. Avoid that by adding new label and go to that instead. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Daniel Borkmann <danborkmann@googlemail.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: chavey@google.com Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10net: Fix dev_mc_add()Eric Dumazet1-2/+3
Commit 6e17d45a (net: add addr len check to dev_mc_add) added a bug in dev_mc_add(), since it can now exit with a lock imbalance. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10net: Annotates neigh_invalidate()Eric Dumazet1-0/+2
Annotates neigh_invalidate() with __releases() and __acquires() for sparse sake. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08ethtool: Use noinline_for_stackEric Dumazet1-32/+8
Use self documenting noinline_for_stack instead of duplicated comments. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07sock.c: potential null dereferenceDan Carpenter1-1/+2
We test that "prot->rsk_prot" is non-null right before we dereference it on this line. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05ethtool: Add direct access to ops->get_sset_countJeff Garzik1-4/+3
On 03/04/2010 09:26 AM, Ben Hutchings wrote: > On Thu, 2010-03-04 at 00:51 -0800, Jeff Kirsher wrote: >> From: Jeff Garzik<jgarzik@redhat.com> >> >> This patch is an alternative approach for accessing string >> counts, vs. the drvinfo indirect approach. This way the drvinfo >> space doesn't run out, and we don't break ABI later. > [...] >> --- a/net/core/ethtool.c >> +++ b/net/core/ethtool.c >> @@ -214,6 +214,10 @@ static noinline int ethtool_get_drvinfo(struct net_device *dev, void __user *use >> info.cmd = ETHTOOL_GDRVINFO; >> ops->get_drvinfo(dev,&info); >> >> + /* >> + * this method of obtaining string set info is deprecated; >> + * consider using ETHTOOL_GSSET_INFO instead >> + */ > > This comment belongs on the interface (ethtool.h) not the > implementation. Debatable -- the current comment is located at the callsite of ops->get_sset_count(), which is where an implementor might think to add a new call. Not all the numeric fields in ethtool_drvinfo are obtained from ->get_sset_count(). Hence the "some" in the attached patch to include/linux/ethtool.h, addressing your comment. > [...] >> +static noinline int ethtool_get_sset_info(struct net_device *dev, >> + void __user *useraddr) >> +{ > [...] >> + /* calculate size of return buffer */ >> + for (i = 0; i< 64; i++) >> + if (sset_mask& (1ULL<< i)) >> + n_bits++; > [...] > > We have a function for this: > > n_bits = hweight64(sset_mask); Agreed. I've attached a follow-up patch, which should enable my/Jeff's kernel patch to be applied, followed by this one. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05ethtool: Add direct access to ops->get_sset_countJeff Garzik1-0/+72
This patch is an alternative approach for accessing string counts, vs. the drvinfo indirect approach. This way the drvinfo space doesn't run out, and we don't break ABI later. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05net: backlog functions renameZhu Yi1-1/+1
sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05net: add limit for socket backlogZhu Yi1-2/+14
We got system OOM while running some UDP netperf testing on the loopback device. The case is multiple senders sent stream UDP packets to a single receiver via loopback on local host. Of course, the receiver is not able to handle all the packets in time. But we surprisingly found that these packets were not discarded due to the receiver's sk->sk_rcvbuf limit. Instead, they are kept queuing to sk->sk_backlog and finally ate up all the memory. We believe this is a secure hole that a none privileged user can crash the system. The root cause for this problem is, when the receiver is doing __release_sock() (i.e. after userspace recv, kernel udp_recvmsg -> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to sk_receive_queue with the softirq enabled. In the above case, multiple busy senders will almost make it an endless loop. The skbs in the backlog end up eat all the system memory. The issue is not only for UDP. Any protocols using socket backlog is potentially affected. The patch adds limit for socket backlog so that the backlog size cannot be expanded endlessly. Reported-by: Alex Shi <alex.shi@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-28Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller4-5/+14
Conflicts: drivers/firmware/iscsi_ibft.c
2010-02-28scm: Only support SCM_RIGHTS on unix domain sockets.Eric W. Biederman1-0/+2
We use scm_send and scm_recv on both unix domain and netlink sockets, but only unix domain sockets support everything required for file descriptor passing, so error if someone attempts to pass file descriptors over netlink sockets. Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-28ethtool: do not set some flags, if others failedJeff Garzik1-4/+6
NETIF_F_NTUPLE flag setting introduced a bug: non-ntuple flags like LRO may be successfully set, before ioctl(2) returns failure to userspace. The set-flags operation should be all-or-none, rather than leaving things in an inconsistent state prior to reporting failure to userspace. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27rtnetlink: support specifying device flags on device creationPatrick McHardy1-9/+43
commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f Author: Patrick McHardy <kaber@trash.net> Date: Tue Feb 23 20:41:30 2010 +0100 Support specifying the initial device flags when creating a device though rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING in order to surpress netlink registration notifications. To complete setup, rtnl_configure_link() must be called, which performs the device flag changes and invokes the deferred notifiers if everything went well. Two examples: # add macvlan to eth0 # $ ip link add link eth0 up allmulticast on type macvlan [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff [ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0 [ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 link/ether 26:f8:84:02:f9:2a [ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link valid_lft forever preferred_lft forever [ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0 [ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0 [NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE [ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0 [PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084 [ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic valid_lft 86399sec preferred_lft 14399sec # add VLAN to eth1, eth1 is down # $ ip link add link eth1 up type vlan id 1000 RTNETLINK answers: Network is down <no events> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27dev: support deferring device flag change notificationsPatrick McHardy2-61/+104
Split dev_change_flags() into two functions: __dev_change_flags() to perform the actual changes and __dev_notify_flags() to invoke netdevice notifiers. This will be used by rtnl_link to defer netlink notifications until the device has been fully configured. This changes ordering of some operations, in particular: - netlink notifications are sent after all changes have been performed. As a side effect this surpresses one unnecessary netlink message when the IFF_UP and other flags are changed simultaneously. - The NETDEV_UP/NETDEV_DOWN and NETDEV_CHANGE notifiers are invoked after all changes have been performed. Their relative is unchanged. - net_dmaengine_put() is invoked before the NETDEV_DOWN notifier instead of afterwards. This should not make any difference since both RX and TX are already shut down at this point. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27rtnetlink: handle rtnl_link netlink notifications manuallyPatrick McHardy2-4/+8
In order to support specifying device flags during device creation, we must be able to roll back device registration in case setting the flags fails without sending any notifications related to the device to userspace. This patch changes rollback_registered_many() and register_netdevice() to manually send netlink notifications for devices not handled by rtnl_link and allows to defer notifications for devices handled by rtnl_link until setup is complete. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27rtnetlink: ignore NETDEV_PRE_UP notifier in rtnetlink_event()Patrick McHardy1-0/+1
Commit 3b8bcfd (net: introduce pre-up netdev notifier) added a new notifier which is run before a device is set UP for use by cfg80211. The patch missed to add the new notifier to the ignore list in rtnetlink_event(), so we currently get an unnecessary netlink notification before a device is set UP. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26Revert "ethtool: Add n-tuple string length to drvinfo and return it"David S. Miller1-3/+0
This reverts commit c79c5ffdce14abb4de3878c5aa8c3c6e5093c69b. As Jeff points out we can't break the user visible interface like this, we need to add this into the reserved[] thing. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26net: add addr len check to dev_mc_addJiri Pirko1-0/+2
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26ethtool: Add n-tuple string length to drvinfo and return itPeter Waskiewicz1-0/+3
The drvinfo struct should include the number of strings that get_rx_ntuple will return. It will be variable if an underlying driver implements its own get_rx_ntuple routine, so userspace needs to know how much data is coming. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26netdev: use list_first_entry macrostephen hemminger1-3/+3
Use list_first_entry macro; no longer any need to use 'next' directly in list to find first entry. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26rtnetlink: clean up SR-IOV config interfaceWilliams, Mitch A1-8/+5
This patch consists of a few minor cleanups to the SR-IOV configurion code in rtnetlink. - Remove unneccesary lock - Remove unneccesary casts - Return correct error code for no driver support These changes are based on comments from Patrick McHardy Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-25Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-1/+2
2010-02-25net: Add checking to rcu_dereference() primitivesPaul E. McKenney4-5/+14
Update rcu_dereference() primitives to use new lockdep-based checking. The rcu_dereference() in __in6_dev_get() may be protected either by rcu_read_lock() or RTNL, per Eric Dumazet. The rcu_dereference() in __sk_free() is protected by the fact that it is never reached if an update could change it. Check for this by using rcu_dereference_check() to verify that the struct sock's ->sk_wmem_alloc counter is zero. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-23net: bug fix for vlan + gro issueAjit Khaparde1-1/+1
Traffic (tcp) doesnot start on a vlan interface when gro is enabled. Even the tcp handshake was not taking place. This is because, the eth_type_trans call before the netif_receive_skb in napi_gro_finish() resets the skb->dev to napi->dev from the previously set vlan netdev interface. This causes the ip_route_input to drop the incoming packet considering it as a packet coming from a martian source. I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7. With this fix, the traffic starts and the test runs fine on both vlan and non-vlan interfaces. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>