linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-02-02	sched: CHOKe flow scheduler	stephen hemminger	1	-0/+29
	CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative packet scheduler based on the Random Exponential Drop (RED) algorithm. The core idea is: For every packet arrival: Calculate Qave if (Qave < minth) Queue the new packet else Select randomly a packet from the queue if (both packets from same flow) then Drop both the packets else if (Qave > maxth) Drop packet else Admit packet with proability p (same as RED) See also: Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active queue management scheme for approximating fair bandwidth allocation", Proceeding of INFOCOM'2000, March 2000. Help from: Eric Dumazet <eric.dumazet@gmail.com> Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	David S. Miller	13	-1/+1880

2011-02-03	netfilter: xtables: add device group match	Patrick McHardy	2	-0/+22
	Add a new 'devgroup' match to match on the device group of the incoming and outgoing network device of a packet. Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02	netfilter: ipset: fix linking with CONFIG_IPV6=n	Patrick McHardy	1	-0/+10
	Add a dummy ip_set_get_ip6_port function that unconditionally returns false for CONFIG_IPV6=n and convert the real function to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module when loading ipset. Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: ipset: install ipset related header files	Patrick McHardy	3	-0/+8
	Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: xtables: "set" match and "SET" target support	Jozsef Kadlecsik	1	-0/+55
	The patch adds the combined module of the "SET" target and "set" match to netfilter. Both the previous and the current revisions are supported. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: ipset: list:set set type support	Jozsef Kadlecsik	1	-0/+27
	The module implements the list:set type support in two flavours: without and with timeout. The sets has two sides: for the userspace, they store the names of other (non list:set type of) sets: one can add, delete and test set names. For the kernel, it forms an ordered union of the member sets: the members sets are tried in order when elements are added, deleted and tested and the process stops at the first success. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: ipset: hash:ip set type support	Jozsef Kadlecsik	2	-0/+1100
	The module implements the hash:ip type support in four flavours: for IPv4 or IPv6, both without and with timeout support. All the hash types are based on the "array hash" or ahash structure and functions as a good compromise between minimal memory footprint and speed. The hashing uses arrays to resolve clashes. The hash table is resized (doubled) when searching becomes too long. Resizing can be triggered by userspace add commands only and those are serialized by the nfnl mutex. During resizing the set is read-locked, so the only possible concurrent operations are the kernel side readers. Those are protected by RCU locking. Because of the four flavours and the other hash types, the functions are implemented in general forms in the ip_set_ahash.h header file and the real functions are generated before compiling by macro expansion. Thus the dereferencing of low-level functions and void pointer arguments could be avoided: the low-level functions are inlined, the function arguments are pointers of type-specific structures. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: ipset: bitmap:ip set type support	Jozsef Kadlecsik	2	-0/+158
	The module implements the bitmap:ip set type in two flavours, without and with timeout support. In this kind of set one can store IPv4 addresses (or network addresses) from a given range. In order not to waste memory, the timeout version does not rely on the kernel timer for every element to be timed out but on garbage collection. All set types use this mechanism. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: ipset: IP set core support	Jozsef Kadlecsik	3	-0/+498
	The patch adds the IP set core support to the kernel. The IP set core implements a netlink (nfnetlink) based protocol by which one can create, destroy, flush, rename, swap, list, save, restore sets, and add, delete, test elements from userspace. For simplicity (and backward compatibilty and for not to force ip(6)tables to be linked with a netlink library) reasons a small getsockopt-based protocol is also kept in order to communicate with the ip(6)tables match and target. The netlink protocol passes all u16, etc values in network order with NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the NLA_F_NESTED and NLA_F_NET_BYTEORDER flags. For other kernel subsystems (netfilter match and target) the API contains the functions to add, delete and test elements in sets and the required calls to get/put refereces to the sets before those operations can be performed. The set types (which are implemented in independent modules) are stored in a simple RCU protected list. A set type may have variants: for example without timeout or with timeout support, for IPv4 or for IPv6. The sets (i.e. the pointers to the sets) are stored in an array. The sets are identified by their index in the array, which makes possible easy and fast swapping of sets. The array is protected indirectly by the nfnl mutex from nfnetlink. The content of the sets are protected by the rwlock of the set. There are functional differences between the add/del/test functions for the kernel and userspace: - kernel add/del/test: works on the current packet (i.e. one element) - kernel test: may trigger an "add" operation in order to fill out unspecified parts of the element from the packet (like MAC address) - userspace add/del: works on the netlink message and thus possibly on multiple elements from the IPSET_ATTR_ADT container attribute. - userspace add: may trigger resizing of a set Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01	netfilter: NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macros	Jozsef Kadlecsik	1	-1/+2
	The patch adds the NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macros to the vanilla kernel. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-31	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	David S. Miller	3	-0/+4

2011-01-30	net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT	Eric W. Biederman	1	-0/+1
	SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1, which unfortunately means the existing infrastructure for compat networking ioctls is insufficient. A trivial compact ioctl implementation would conflict with: SIOCAX25ADDUID SIOCAIPXPRISLT SIOCGETSGCNT_IN6 SIOCGETSGCNT SIOCRSSCAUSE SIOCX25SSUBSCRIP SIOCX25SDTEFACILITIES To make this work I have updated the compat_ioctl decode path to mirror the the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl function into struct proto so I can break out ioctls by which kind of ip socket I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal ipmr_ioctl. This was necessary because unfortunately the struct layout for the SIOCGETSGCNT has unsigned longs in it so changes between 32bit and 64bit kernels. This change was sufficient to run a 32bit ip multicast routing daemon on a 64bit kernel. Reported-by: Bill Fenner <fenner@aristanetworks.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30	caif: bugfix - add caif headers for userspace usage.	sjur.brandeland@stericsson.com	2	-0/+3
	Add caif_socket.h and if_caif.h to the kernel header files exported for use by userspace. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27	net: fix dev_seq_next()	Eric Dumazet	1	-1/+8
	Commit c6d14c84566d (net: Introduce for_each_netdev_rcu() iterator) added a race in dev_seq_next(). The rcu_dereference() call should be done _before_ testing the end of list, or we might return a wrong net_device if a concurrent thread changes net_device list under us. Note : discovered thanks to a sparse warning : net/core/dev.c:3919:9: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24	net: reduce and unify printk level in netdev_fix_features()	Michał Mirosław	1	-1/+1
	Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will be used by ethtool and might spam dmesg unnecessarily. This converts the function to use netdev_info() instead of plain printk(). As a side effect, bonding and bridge devices will now log dropped features on every slave device change. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24	net: change netdev->features to u32	Michał Mirosław	2	-13/+13
	Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24	net: RPS: Enable hardware acceleration of RFS	Ben Hutchings	1	-3/+30
	Allow drivers for multiqueue hardware with flow filter tables to accelerate RFS. The driver must: 1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion IRQs (in queue order). This will provide a mapping from CPUs to the queues for which completions are handled nearest to them. 2. Implement net_device_ops::ndo_rx_flow_steer. This operation adds or replaces a filter steering the given flow to the given RX queue, if possible. 3. Periodically remove filters for which rps_may_expire_flow() returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24	lib: cpu_rmap: CPU affinity reverse-mapping	Ben Hutchings	1	-0/+73
	When initiating I/O on a multiqueue and multi-IRQ device, we may want to select a queue for which the response will be handled on the same or a nearby CPU. This requires a reverse-map of IRQ affinity. Add library functions to support a generic reverse-mapping from CPUs to objects with affinity and the specific case where the objects are IRQs. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24	Merge branch 'irq/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	David S. Miller	2	-1/+35

2011-01-24	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	David S. Miller	15	-44/+100
	Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c
2011-01-25	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-4/+0
	* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: Remove Kconfig symbol for UIE emulation RTC: Properly handle rtc_read_alarm error propagation and fix bug RTC: Propagate error handling via rtc_timer_enqueue properly acpi_pm: Clear pmtmr_ioport if acpi_pm initialization fails rtc: Cleanup removed UIE emulation declaration hrtimers: Notify hrtimer users of switches to NOHZ mode
2011-01-24	Merge branch 'BUG_ON' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus	Linus Torvalds	4	-9/+32
	* 'BUG_ON' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: Remove MAYBE_BUILD_BUG_ON BUILD_BUG_ON: make it handle more cases
2011-01-24	Remove MAYBE_BUILD_BUG_ON	Rusty Russell	4	-4/+6
	Now BUILD_BUG_ON() can handle optimizable constants, we don't need MAYBE_BUILD_BUG_ON any more. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24	BUILD_BUG_ON: make it handle more cases	Rusty Russell	1	-6/+27
	BUILD_BUG_ON used to use the optimizer to do code elimination or fail at link time; it was changed to first the size of a negative array (a nicer compile time error), then (in 8c87df457cb58fe75b9b893007917cf8095660a0) to a bitfield. This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON(); as Jan points out in that commit, it didn't work as intended anyway. bitfields: needs a literal constant at parse time, and can't be put under "if (__builtin_constant_p(x))" for example. negative array: can handle anything, but if the compiler can't tell it's a constant, silently has no effect. link time: breaks link if the compiler can't determine the value, but the linker output is not usually as informative as a compiler error. If we use the negative-array-size method and the link time trick, we get the ability to use BUILD_BUG_ON() under __builtin_constant_p() branches, and maximal ability for the compiler to detect errors at build time. We also document it thoroughly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jan Beulich <JBeulich@novell.com> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
2011-01-24	param: add null statement to compiled-in module params	Linus Walleij	1	-2/+4
	Add an unused struct declaration statement requiring a terminating semicolon to the compile-in case to provoke an error if __MODULE_INFO() is used without the terminating semicolon. Previously MODULE_ALIAS("foo") (no semicolon) compiled fine if MODULE was not selected. Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24	module: fix linker error for MODULE_VERSION when !MODULE and CONFIG_SYSFS=n	Rusty Russell	1	-1/+1
	lib/built-in.o:(__modver+0x8): undefined reference to `__modver_version_show' lib/built-in.o:(__modver+0x2c): undefined reference to `__modver_version_show' Simplest to just not emit anything: if they've disabled SYSFS they probably want the smallest kernel possible. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24	module: show version information for built-in modules in sysfs	Dmitry Torokhov	1	-0/+27
	Currently only drivers that are built as modules have their versions shown in /sys/module/<module_name>/version, but this information might also be useful for built-in drivers as well. This especially important for drivers that do not define any parameters - such drivers, if built-in, are completely invisible from userspace. This patch changes MODULE_VERSION() macro so that in case when we are compiling built-in module, version information is stored in a separate section. Kernel then uses this data to create 'version' sysfs attribute in the same fashion it creates attributes for module parameters. Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-22	genirq: Add IRQ affinity notifiers	Ben Hutchings	2	-1/+35
	When initiating I/O on a multiqueue and multi-IRQ device, we may want to select a queue for which the response will be handled on the same or a nearby CPU. This requires a reverse-map of IRQ affinity. Add a notification mechanism to support this. This is based closely on work by Thomas Gleixner <tglx@linutronix.de>. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: linux-net-drivers@solarflare.com Cc: Tom Herbert <therbert@google.com> Cc: David Miller <davem@davemloft.net> LKML-Reference: <1295470904.11126.84.camel@bwh-desktop> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-21	Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds	1	-0/+3
	* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: note the nested NOT_RUNNING test in worker_clr_flags() isn't a noop workqueue: relax lockdep annotation on flush_work()
2011-01-21	Merge branch 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-14/+0
	* 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits) um: Use generic irq Kconfig tile: Use generic irq Kconfig sparc: Use generic irq Kconfig score: Use generic irq Kconfig powerpc: Use generic irq Kconfig parisc: Use generic irq Kconfig mn10300: Use generic irq Kconfig microblaze: Use generic irq Kconfig m68knommu: Use generic irq Kconfig ia64: Use generic irq Kconfig frv: Use generic irq Kconfig blackfin: Use generic irq Kconfig alpha: Use generic irq Kconfig genirq: Remove __do_IRQ m32r: Convert to generic irq Kconfig m32r: Convert usrv platform irq handling m32r: Convert opsput_lcdpld irq chip m32r: Convert opsput lanpld irq chip m32r: Convert opsput pld irq chip m32r: Convert opsput irq chip ...
2011-01-21	mm: System without MMU do not need pte_mkwrite	Michal Simek	1	-0/+2
	The patch "thp: export maybe_mkwrite" (commit 14fd403f2146) breaks systems without MMU. Error log: CC arch/microblaze/mm/init.o In file included from include/linux/mman.h:14, from arch/microblaze/mm/consistent.c:24: include/linux/mm.h: In function 'maybe_mkwrite': include/linux/mm.h:482: error: implicit declaration of function 'pte_mkwrite' include/linux/mm.h:482: error: incompatible types in assignment Signed-off-by: Michal Simek <monstr@monstr.eu> CC: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-21	RTC: Propagate error handling via rtc_timer_enqueue properly	John Stultz	1	-2/+0
	In cases where RTC hardware does not support alarms, the virtualized RTC interfaces did not have a way to propagate the error up to userland. This patch extends rtc_timer_enqueue so it catches errors from the hardware and returns them upwards to the virtualized interfaces. To simplify error handling, it also internalizes the management of the timer->enabled bit into rtc_timer_enqueue and rtc_timer_remove. Also makes rtc_timer_enqueue and rtc_timer_remove static. Reported-by: David Daney <ddaney@caviumnetworks.com> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Diagnosed-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: John Stultz <john.stultz@linaro.org> LKML-Reference: <1295565973-14358-1-git-send-email-john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-21	rtc: Cleanup removed UIE emulation declaration	John Stultz	1	-2/+0
	rtc_dev_update_irq_enable_emul was removed in commit 042620a018afcfba1d678062b62e463b9e43a68d (UIE emulation is now handled via hrtimer), but the declaration was missed. This patch cleans it up. Signed-off-by: John Stultz <john.stultz@linaro.org> LKML-Reference: <1294939849-20608-1-git-send-email-john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-21	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6	Linus Torvalds	2	-4/+5
	* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix deadlock during path resolution
2011-01-21	genirq: Remove __do_IRQ	Thomas Gleixner	1	-14/+0
	All architectures are finally converted. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Mike Frysinger <vapier@gentoo.org> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Michal Simek <monstr@monstr.eu> Acked-by: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com>
2011-01-20	net: Add safe reverse SKB queue walkers.	David S. Miller	1	-0/+9
	Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	2	-8/+2
	* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c lockdep: Move early boot local IRQ enable/disable status to init/main.c
2011-01-20	ACPI: Introduce acpi_os_ioremap()	Rafael J. Wysocki	2	-3/+16
	Commit ca9b600be38c ("ACPI / PM: Make suspend_nvs_save() use acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c from using different ioremap() variants by making the latter use acpi_os_map_memory() for mapping the NVS pages. However, that also requires acpi_os_unmap_memory() to be used for unmapping them, which causes synchronize_rcu() to be executed many times in a row unnecessarily and introduces substantial delays during resume on some systems. Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c introduce acpi_os_ioremap() calling ioremap_cache() and make the code in both osl.c and nvs.c use it. Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: fix USED bit handling at uncharge in THP	KAMEZAWA Hiroyuki	1	-0/+9
	Now, under THP: at charge: - PageCgroupUsed bit is set to all page_cgroup on a hugepage. ....set to 512 pages. at uncharge - PageCgroupUsed bit is unset on the head page. So, some pages will remain with "Used" bit. This patch fixes that Used bit is set only to the head page. Used bits for tail pages will be set at splitting if necessary. This patch adds this lock order: compound_lock() -> page_cgroup_move_lock(). [akpm@linux-foundation.org: fix warning] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	dccp: clean up unused DCCP_STATE_MASK definition	Shan Wei	1	-2/+0
	Remove unused DCCP_STATE_MASK macro. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20	netfilter: xtables: add missing header inclusions for headers_check	Jan Engelhardt	39	-0/+77
	Resolve these warnings on `make headers_check`: usr/include/linux/netfilter/xt_CT.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h> ... Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-20	Merge branch 'connlimit' of git://dev.medozas.de/linux	Patrick McHardy	1	-1/+1

2011-01-20	netfilter: xtables: remove duplicate member	Jan Engelhardt	1	-1/+1
	Accidentally missed removing the old out-of-union "inverse" member, which caused the struct size to change which then gives size mismatch warnings when using an old iptables. It is interesting to see that gcc did not warn about this before. (Filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47376 ) Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-20	lockdep: Move early boot local IRQ enable/disable status to init/main.c	Tejun Heo	2	-8/+2
	During early boot, local IRQ is disabled until IRQ subsystem is properly initialized. During this time, no one should enable local IRQ and some operations which usually are not allowed with IRQ disabled, e.g. operations which might sleep or require communications with other processors, are allowed. lockdep tracked this with early_boot_irqs_off/on() callbacks. As other subsystems need this information too, move it to init/main.c and make it generally available. While at it, toggle the boolean to early_boot_irqs_disabled instead of enabled so that it can be initialized with %false and %true indicates the exceptional condition. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110120110635.GB6036@htj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20	Merge branch 'connlimit' of git://dev.medozas.de/linux	Patrick McHardy	1	-0/+12
	Conflicts: Documentation/feature-removal-schedule.txt Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	David S. Miller	48	-162/+264

2011-01-20	netfilter: xtables: remove extraneous header that slipped in	Jan Engelhardt	1	-1/+0
	Commit 0b8ad87 (netfilter: xtables: add missing header files to export list) erroneously added this. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19	net_sched: implement a root container qdisc sch_mqprio	John Fastabend	1	-0/+12
	This implements a mqprio queueing discipline that by default creates a pfifo_fast qdisc per tx queue and provides the needed configuration interface. Using the mqprio qdisc the number of tcs currently in use along with the range of queues alloted to each class can be configured. By default skbs are mapped to traffic classes using the skb priority. This mapping is configurable. Configurable parameters, struct tc_mqprio_qopt { __u8 num_tc; __u8 prio_tc_map[TC_BITMASK + 1]; __u8 hw; __u16 count[TC_MAX_QUEUE]; __u16 offset[TC_MAX_QUEUE]; }; Here the count/offset pairing give the queue alignment and the prio_tc_map gives the mapping from skb->priority to tc. The hw bit determines if the hardware should configure the count and offset values. If the hardware bit is set then the operation will fail if the hardware does not implement the ndo_setup_tc operation. This is to avoid undetermined states where the hardware may or may not control the queue mapping. Also minimal bounds checking is done on the count/offset to verify a queue does not exceed num_tx_queues and that queue ranges do not overlap. Otherwise it is left to user policy or hardware configuration to create useful mappings. It is expected that hardware QOS schemes can be implemented by creating appropriate mappings of queues in ndo_tc_setup(). One expected use case is drivers will use the ndo_setup_tc to map queue ranges onto 802.1Q traffic classes. This provides a generic mechanism to map network traffic onto these traffic classes and removes the need for lower layer drivers to know specifics about traffic types. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19	net: implement mechanism for HW based QOS	John Fastabend	1	-0/+68
	This patch provides a mechanism for lower layer devices to steer traffic using skb->priority to tx queues. This allows for hardware based QOS schemes to use the default qdisc without incurring the penalties related to global state and the qdisc lock. While reliably receiving skbs on the correct tx ring to avoid head of line blocking resulting from shuffling in the LLD. Finally, all the goodness from txq caching and xps/rps can still be leveraged. Many drivers and hardware exist with the ability to implement QOS schemes in the hardware but currently these drivers tend to rely on firmware to reroute specific traffic, a driver specific select_queue or the queue_mapping action in the qdisc. By using select_queue for this drivers need to be updated for each and every traffic type and we lose the goodness of much of the upstream work. Firmware solutions are inherently inflexible. And finally if admins are expected to build a qdisc and filter rules to steer traffic this requires knowledge of how the hardware is currently configured. The number of tx queues and the queue offsets may change depending on resources. Also this approach incurs all the overhead of a qdisc with filters. With the mechanism in this patch users can set skb priority using expected methods ie setsockopt() or the stack can set the priority directly. Then the skb will be steered to the correct tx queues aligned with hardware QOS traffic classes. In the normal case with single traffic class and all queues in this class everything works as is until the LLD enables multiple tcs. To steer the skb we mask out the lower 4 bits of the priority and allow the hardware to configure upto 15 distinct classes of traffic. This is expected to be sufficient for most applications at any rate it is more then the 8021Q spec designates and is equal to the number of prio bands currently implemented in the default qdisc. This in conjunction with a userspace application such as lldpad can be used to implement 8021Q transmission selection algorithms one of these algorithms being the extended transmission selection algorithm currently being used for DCB. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>