linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2008-07-17	net: Use queue aware tests throughout.	David S. Miller	2	-12/+74
	This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17	pkt_sched: Remove RR scheduler.	David S. Miller	1	-9/+0
	This actually fixes a bug added by the RR scheduler changes. The ->bands and ->prio2band parameters were being set outside of the sch_tree_lock() and thus could result in strange behavior and inconsistencies. It might be possible, in the new design (where there will be one qdisc per device TX queue) to allow similar functionality via a TX hash algorithm for RR but I really see no reason to export this aspect of how these multiqueue cards actually implement the scheduling of the the individual DMA TX rings and the single physical MAC/PHY port. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17	netdev: Kill NETIF_F_MULTI_QUEUE.	David S. Miller	1	-3/+1
	There is no need for a feature bit for something that can be tested by simply checking the TX queue count. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17	netdev: Allocate multiple queues for TX.	David S. Miller	2	-34/+72
	alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	core: add stat to track unresolved discards in neighbor cache	Neil Horman	1	-1/+3
	in __neigh_event_send, if we have a neighbour entry which is in NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour to the neighbours arp_queue, which is default capped to a length of 3 skbs. If that queue exceeds its set length, it will drop an skb on the queue to enqueue the newly arrived skb. This results in a drop for which we have no statistics incremented. This patch adds an unresolved_discards stat to /proc/net/stat/ndisc_cache to track these lost frames. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to NET_ADD_STATS_USER	Pavel Emelyanov	1	-1/+1
	Done with NET_XXX_STATS macros :) To be continued... Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to NET_ADD_STATS_BH	Pavel Emelyanov	1	-1/+1
	This one is tricky. The thing is that this macro is only used when killing tw buckets, but since this killer is promiscuous wrt to which net each particular tw belongs to, I have to use it only when NET_NS is off. When the net namespaces are on, I use the INET_INC_STATS_BH for each bucket. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to NET_INC_STATS_USER	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to NET_INC_STATS_BH	Pavel Emelyanov	2	-2/+2
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to NET_INC_STATS	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	sock: add net to prot->enter_memory_pressure callback	Pavel Emelyanov	2	-3/+3
	The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't have where to get the net from. I decided to add a sk argument, not the net itself, only to factor all the required sock_net(sk) calls inside the enter_memory_pressure callback itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to TCP_ADD_STATS_USER	Pavel Emelyanov	1	-5/+5
	Now we're done with the TCP_XXX_STATS macros. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to TCP_DEC_STATS	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to TCP_INC_STATS_BH	Pavel Emelyanov	1	-1/+1
	Same as before - the sock is always there to get the net from, but there are also some places with the net already saved on the stack. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to TCP_INC_STATS	Pavel Emelyanov	1	-1/+1
	Fortunately (almost) all the TCP code has a sock to get the net from :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	tcp: add net to tcp_mib_init	Pavel Emelyanov	1	-1/+1
	This one sets TCP MIBs after zeroing them, and thus requires the net. The existing single caller can use init_net (temporarily). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: drop unused TCP_XXX_STATS macros	Pavel Emelyanov	1	-2/+0
	TCP_INC_STATS_USER and TCP_ADD_STATS_BH are currently unused. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to IP_ADD_STATS_BH	Pavel Emelyanov	1	-1/+1
	Very simple - only ip_evictor (fragments) requires such. This patch ends up the IP_XXX_STATS patching. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to IP_INC_STATS_BH	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: add net to IP_INC_STATS	Pavel Emelyanov	1	-1/+1
	All the callers already have either the net itself, or the place where to get it from. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16	mib: drop unused IP_INC_STATS_USER	Pavel Emelyanov	1	-1/+0
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15	netdev: Add netdev->addr_list_lock protection.	David S. Miller	1	-0/+20
	Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15	netdev: Add addr_list_lock to struct net_device.	David S. Miller	1	-0/+1
	This will be used to protect the per-device unicast and multicast address lists, as well as the callbacks into the drivers which configure such state such as ->set_rx_mode() and ->set_multicast_list(). Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	mib: add struct net to ICMPMSGIN_INC_STATS_BH	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	mib: add struct net to ICMPMSGOUT_INC_STATS	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	mib: add struct net to ICMP_INC_STATS_BH	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	mib: add struct net to ICMP_INC_STATS	Pavel Emelyanov	1	-1/+1
	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	icmp: drop unused MIB accounting wrappers	Pavel Emelyanov	1	-5/+0
	There are ICMP_XXX_STATS that are not used in the kernel, so I remove them, not to "just patch" them later. But if there's some sense in keeping them, kick me - I will remake this set keeping them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	icmp: add struct net argument to icmp_out_count	Pavel Emelyanov	1	-1/+2
	This routine deals with ICMP statistics, but doesn't have a struct net at hands, so add one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	packet: deliver VLAN TCI to userspace	Patrick McHardy	1	-0/+2
	Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can properly deal with hardware VLAN tagging/stripping. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	packet: support extensible, 64 bit clean mmaped ring structure	Patrick McHardy	1	-0/+21
	The tpacket_hdr is not 64 bit clean due to use of an unsigned long and can't be extended because the following struct sockaddr_ll needs to be at a fixed offset. Add support for a version 2 tpacket protocol that removes these limitations. Userspace can query the header size through a new getsockopt option and change the protocol version through a setsockopt option. The changes needed to switch to the new protocol version are: 1. replace struct tpacket_hdr by struct tpacket2_hdr 2. query header len and save 3. set protocol version to 2 - set up ring as usual 4. for getting the sockaddr_ll, use (void )hdr + TPACKET_ALIGN(hdrlen) instead of (void )hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr)) Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	vlan: deliver packets received with VLAN acceleration to network taps	Patrick McHardy	1	-0/+1
	When VLAN header stripping is used, packets currently bypass packet sockets (and other network taps) completely. For locally existing VLANs, they appear directly on the VLAN device, for unknown VLANs they are silently dropped. Add a new function netif_nit_deliver() to deliver incoming packets to all network interface taps and use it in __vlan_hwaccel_rx() to make VLAN packets visible on the underlying device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	vlan: Don't store VLAN tag in cb	Patrick McHardy	2	-24/+10
	Use a real skb member to store the skb to avoid clashes with qdiscs, which are allowed to use the cb area themselves. As currently only real devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit invalidation is neccessary. The new member fills a hole on 64 bit, the skb layout changes from: __u32 mark; /* 172 4 / sk_buff_data_t transport_header; / 176 4 / sk_buff_data_t network_header; / 180 4 / sk_buff_data_t mac_header; / 184 4 / sk_buff_data_t tail; / 188 4 / / --- cacheline 3 boundary (192 bytes) --- / sk_buff_data_t end; / 192 4 / / XXX 4 bytes hole, try to pack / to __u32 mark; / 172 4 / __u16 vlan_tci; / 176 2 / / XXX 2 bytes hole, try to pack / sk_buff_data_t transport_header; / 180 4 / sk_buff_data_t network_header; / 184 4 */ Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	tipc: Remove unneeded parameter to tipc_createport_raw()	Allan Stephens	1	-10/+3
	This patch eliminates an unneeded parameter when creating a low-level TIPC port object. Instead of returning both the pointer to the port structure and the port's reference ID, it now returns only the pointer since the port structure contains the reference ID as one of its fields. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	David S. Miller	1	-0/+13

2008-07-14	tun: Fix/rewrite packet filtering logic	Max Krasnyansky	1	-3/+21
	Please see the following thread to get some context on this http://marc.info/?l=linux-netdev&m=121564433018903&w=2 Basically the issue is that current multi-cast filtering stuff in the TUN/TAP driver is seriously broken. Original patch went in without proper review and ACK. It was broken and confusing to start with and subsequent patches broke it completely. To give you an idea of what's broken here are some of the issues: - Very confusing comments throughout the code that imply that the character device is a network interface in its own right, and that packets are passed between the two nics. Which is completely wrong. - Wrong set of ioctls is used for setting up filters. They look like shortcuts for manipulating state of the tun/tap network interface but in reality manipulate the state of the TX filter. - ioctls that were originally used for setting address of the the TX filter got "fixed" and now set the address of the network interface itself. Which made filter totaly useless. - Filtering is done too late. Instead of filtering early on, to avoid unnecessary wakeups, filtering is done in the read() call. The list goes on and on :) So the patch cleans all that up. It introduces simple and clean interface for setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering before enqueuing the packets. TX filtering is useful in the scenarios where TAP is part of a bridge, in which case it gets all broadcast, multicast and potentially other packets when the bridge is learning. So for example Ethernet tunnelling app may want to setup TX filters to avoid tunnelling multicast traffic. QEMU and other hypervisors can push RX filtering that is currently done in the guest into the host context therefore saving wakeups and unnecessary data transfer. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	David S. Miller	2	-38/+36

2008-07-14	net-sched: cls_flow: add perturbation support	Patrick McHardy	1	-0/+1
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	David S. Miller	1	-4/+2

2008-07-14	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	David S. Miller	1	-0/+1
	Conflicts: net/netfilter/nf_conntrack_proto_tcp.c
2008-07-14	netfilter: Let nf_ct_kill() callers know if del_timer() returned true.	David S. Miller	1	-10/+10
	Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14	mac80211: fix struct ieee80211_tx_queue_params	Johannes Berg	1	-5/+5
	Multiple issues: - there are no "default" values needed - cw_min/cw_max can be larger than documented - restructure to decrease size - use get_unaligned_le16 Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14	mac80211: fix TX sequence numbers	Johannes Berg	1	-0/+12
	This patch makes mac80211 assign proper sequence numbers to QoS-data frames. It also removes the old sequence number code because we noticed that only the driver or hardware can assign sequence numbers to non-QoS-data and especially management frames in a race-free manner because beacons aren't passed through mac80211's TX path. This patch also adds temporary code to the rt2x00 drivers to not break them completely, that code will have to be reworked for proper sequence numbers on beacons. It also moves sequence number assignment down in the TX path so no sequence numbers are assigned to frames that are dropped. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14	ssb: Include dma-mapping.h	Michael Buesch	1	-0/+1
	ssb.h implements DMA mapping functions, so it should include dma-mapping.h. This fixes compile failures on certain architectures. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14	mac80211: revamp beacon configuration	Johannes Berg	1	-33/+16
	This patch changes mac80211's beacon configuration handling to never pass skbs to the driver directly but rather always require the driver to use ieee80211_beacon_get(). Additionally, it introduces "change flags" on the config_interface() call to enable drivers to figure out what is changing. Finally, it removes the beacon_update() driver callback in favour of having IBSS beacon delivered by ieee80211_beacon_get() as well. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14	mac80211: power management wext hooks	Samuel Ortiz	1	-0/+2
	This patch implements the power management routines wireless extensions for mac80211. For now we only support switching PS mode between on and off. Signed-off-by: Samuel Ortiz <sameo@openedhand.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-13	dccp: Upgrade NDP count from 3 to 6 bytes	Gerrit Renker	1	-4/+2
	RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code is currently limited to up to 3 bytes. This seems to be a relict of an earlier draft version and is brought up to date by the patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-11	net: add netif_napi_del function to allow for removal of napistructs	Alexander Duyck	1	-0/+13
	Adds netif_napi_del function which is used to remove the napi struct from the netdev napi_list in cases where CONFIG_NETPOLL was enabled. The motivation for adding this is to handle the case in which the number of queues on a device changes due to a configuration change. Previously the napi structs for each queue would be left in the list until the netdev was freed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-10	xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info	Steffen Klassert	1	-0/+1
	Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for the selector family. Userspace applications can set this flag to leave the selector family of the xfrm_state unspecified. This can be used to to handle inter family tunnels if the selector is not set from userspace. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08	netdev: Move atomic queue state bits into netdev_queue.	David S. Miller	2	-17/+40
	Signed-off-by: David S. Miller <davem@davemloft.net>