aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2009-04-02memcg: fix OOM killer under memcgKAMEZAWA Hiroyuki1-1/+1
This patch tries to fix OOM Killer problems caused by hierarchy. Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to kill a task in memcg. But, when hierarchy is used, it's broken and correct task cannot be killed. For example, in following cgroup /groupA/ hierarchy=1, limit=1G, 01 nolimit 02 nolimit All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to groupA's 1Gbytes but OOM Killer just kills tasks in groupA. This patch provides makes the bad process be selected from all tasks under hierarchy. BTW, currently, oom_jiffies is updated against groupA in above case. oom_jiffies of tree should be updated. To see how oom_jiffies is used, please check mem_cgroup_oom_called() callers. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: const fix] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02debug cgroup: remove unneeded cgroup_lockLi Zefan1-2/+0
Since we are in cgroup write handler, so the cgrp is valid, so we don't have to hold cgroup_mutex when calling cgroup_task_count(). One similar example is in cgroup_tasks_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02cgroups: don't change release_agent when remount failedLi Zefan1-2/+3
Remount can fail in either case: - wrong mount options is specified, or option 'noprefix' is changed. - a to-be-added subsys is already mounted/active. When using remount to change 'release_agent', for the above former failure case, remount will return errno with release_agent unchanged, but for the latter case, remount will return EBUSY with relase_agent changed, which is unexpected I think: # mount -t cgroup -o cpu xxx /cgrp1 # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2 # cat /cgrp2/release_agent agent1 # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2 mount: /cgrp2 not mounted already, or bad option # cat /cgrp2/release_agent agent1 <-- ok # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2 mount: /cgrp2 is busy # cat /cgrp2/release_agent agent2 <-- unexpected! Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02cgroups: show correct file modeLi Zefan2-4/+35
We have some read-only files and write-only files, but currently they are all set to 0644, which is counter-intuitive and cause trouble for some cgroup tools like libcgroup. This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's own files' file mode, and for the most cases cft->mode can be default to 0 and cgroup will figure out proper mode. Acked-by: Paul Menage <menage@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02kernel/cgroup.c: kfree(NULL) is legalJesper Juhl1-6/+3
Reduces object file size a bit: Before: $ size kernel/cgroup.o text data bss dec hex filename 21593 7804 4924 34321 8611 kernel/cgroup.o After: $ size kernel/cgroup.o text data bss dec hex filename 21537 7744 4924 34205 859d kernel/cgroup.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02cgroup: fix frequent -EBUSY at rmdirKAMEZAWA Hiroyuki1-14/+67
In following situation, with memory subsystem, /groupA use_hierarchy==1 /01 some tasks /02 some tasks /03 some tasks /04 empty When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim is triggered and the kernel walks tree under groupA. In this case, rmdir /groupA/04 fails with -EBUSY frequently because of temporal refcnt from the kernel. In general. cgroup can be rmdir'd if there are no children groups and no tasks. Frequent fails of rmdir() is not useful to users. (And the reason for -EBUSY is unknown to users.....in most cases) This patch tries to modify above behavior, by - retries if css_refcnt is got by someone. - add "return value" to pre_destroy() and allows subsystem to say "we're really busy!" Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02cgroup: CSS ID supportKAMEZAWA Hiroyuki1-1/+285
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code. This patch attaches unique ID to each css and provides following. - css_lookup(subsys, id) returns pointer to struct cgroup_subysys_state of id. - css_get_next(subsys, id, rootid, depth, foundid) returns the next css under "root" by scanning When cgroup_subsys->use_id is set, an id for css is maintained. The cgroup framework only parepares - css_id of root css for subsys - id is automatically attached at creation of css. - id is *not* freed automatically. Because the cgroup framework don't know lifetime of cgroup_subsys_state. free_css_id() function is provided. This must be called by subsys. There are several reasons to develop this. - Saving space .... For example, memcg's swap_cgroup is array of pointers to cgroup. But it is not necessary to be very fast. By replacing pointers(8bytes per ent) to ID (2byes per ent), we can reduce much amount of memory usage. - Scanning without lock. CSS_ID provides "scan id under this ROOT" function. By this, scanning css under root can be written without locks. ex) do { rcu_read_lock(); next = cgroup_get_next(subsys, id, root, &found); /* check sanity of next here */ css_tryget(); rcu_read_unlock(); id = found + 1 } while(...) Characteristics: - Each css has unique ID under subsys. - Lifetime of ID is controlled by subsys. - css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy - Allowed ID is 1-65535, ID 0 is UNUSED ID. Design Choices: - scan-by-ID v.s. scan-by-tree-walk. As /proc's pid scan does, scan-by-ID is robust when scanning is done by following kind of routine. scan -> rest a while(release a lock) -> conitunue from interrupted memcg's hierarchical reclaim does this. - When subsys->use_id is set, # of css in the system is limited to 65535. [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroupsGrzegorz Nosek2-15/+10
The ns_proxy cgroup allows moving processes to child cgroups only one level deep at a time. This commit relaxes this restriction and makes it possible to attach tasks directly to grandchild cgroups, e.g.: ($pid is in the root cgroup) echo $pid > /cgroup/CG1/CG2/tasks Previously this operation would fail with -EPERM and would have to be performed as two steps: echo $pid > /cgroup/CG1/tasks echo $pid > /cgroup/CG1/CG2/tasks Also, the target cgroup no longer needs to be empty to move a task there. Signed-off-by: Grzegorz Nosek <root@localdomain.pl> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02Simplify copy_thread()Alexey Dobriyan1-1/+1
First argument unused since 2.3.11. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02nommu: fix a number of issues with the per-MM VMA patchDavid Howells1-0/+1
Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key()Davide Libenzi1-4/+19
This patchset introduces wakeup hints for some of the most popular (from epoll POV) devices, so that epoll code can avoid spurious wakeups on its waiters. The problem with epoll is that the callback-based wakeups do not, ATM, carry any information about the events the wakeup is related to. So the only choice epoll has (not being able to call f_op->poll() from inside the callback), is to add the file* to a ready-list and resolve the real events later on, at epoll_wait() (or its own f_op->poll()) time. This can cause spurious wakeups, since the wake_up() itself might be for an event the caller is not interested into. The rate of these spurious wakeup can be pretty high in case of many network sockets being monitored. By allowing devices to report the events the wakeups refer to (at least the two major classes - POLLIN/POLLOUT), we are able to spare useless wakeups by proper handling inside the epoll's poll callback. Epoll will have in any case to call f_op->poll() on the file* later on, since the change to be done in order to have the full event set sent via wakeup, is too invasive for the way our f_op->poll() system works (the full event set is calculated inside the poll function - there are too many of them to even start thinking the change - also poll/select would need change too). Epoll is changed in a way that both devices which send event hints, and the ones that don't, are correctly handled. The former will gain some efficiency though. As a general rule for devices, would be to add an event mask by using key-aware wakeup macros, when making up poll wait queues. I tested it (together with the epoll's poll fix patch Andrew has in -mm) and wakeups for the supported devices are correctly filtered. Test program available here: http://www.xmailserver.org/epoll_test.c This patch: Nothing revolutionary here. Just using the available "key" that our wakeup core already support. The __wake_up_locked_key() was no brainer, since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers around __wake_up_common(). The __wake_up_sync() function had a body, so the choice was between borrowing the body for __wake_up_sync_key() and calling it from __wake_up_sync(), or make an inline and calling it from both. I chose the former since in most archs it all resolves to "mov $0, REG; jmp ADDR". Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01pm: rework includes, remove arch ifdefsMagnus Damm2-0/+2
Make the following header file changes: - remove arch ifdefs and asm/suspend.h from linux/suspend.h - add asm/suspend.h to disk.c (for arch_prepare_suspend()) - add linux/io.h to swsusp.c (for ioremap()) - x86 32/64 bit compile fixes Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: fix proc_dointvec_userhz_jiffies "breakage"Alexey Dobriyan1-1/+1
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838 On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange way from the user's point of view: # echo 500 >/proc/sys/vm/dirty_writeback_centisecs # cat /proc/sys/vm/dirty_writeback_centisecs 499 So, we have 5000 jiffies converted to only 499 clock ticks and reported back. TICK_NSEC = 999848 ACTHZ = 256039 Keeping in-kernel variable in units passed from userspace will fix issue of course, but this probably won't be right for every sysctl. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: introduce for_each_populated_zone() macroKOSAKI Motohiro2-15/+11
Impact: cleanup In almost cases, for_each_zone() is used with populated_zone(). It's because almost function doesn't need memoryless node information. Therefore, for_each_populated_zone() can help to make code simplify. This patch has no functional change. [akpm@linux-foundation.org: small cleanup] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumaskLinus Torvalds8-20/+29
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: oprofile: Thou shalt not call __exit functions from __init functions cpumask: remove the now-obsoleted pcibus_to_cpumask(): generic cpumask: remove cpumask_t from core cpumask: convert rcutorture.c cpumask: use new cpumask_ functions in core code. cpumask: remove references to struct irqaction's mask field. cpumask: use mm_cpumask() wrapper: kernel/fork.c cpumask: use set_cpu_active in init/main.c cpumask: remove node_to_first_cpu cpumask: fix seq_bitmap_*() functions. cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL
2009-03-30Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds11-317/+597
* 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits) lockdep: fix deadlock in lockdep_trace_alloc lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB lockdep: annotate reclaim context (__GFP_NOFS), fix lockdep: build fix for !PROVE_LOCKING lockstat: warn about disabled lock debugging lockdep: use stringify.h lockdep: simplify check_prev_add_irq() lockdep: get_user_chars() redo lockdep: simplify get_user_chars() lockdep: add comments to mark_lock_irq() lockdep: remove macro usage from mark_held_locks() lockdep: fully reduce mark_lock_irq() lockdep: merge the !_READ mark_lock_irq() helpers lockdep: merge the _READ mark_lock_irq() helpers lockdep: simplify mark_lock_irq() helpers #3 lockdep: further simplify mark_lock_irq() helpers lockdep: simplify the mark_lock_irq() helpers lockdep: split up mark_lock_irq() lockdep: generate usage strings lockdep: generate the state bit definitions ...
2009-03-30lockdep: fix deadlock in lockdep_trace_allocPeter Zijlstra1-2/+19
Heiko reported that we grab the graph lock with irqs enabled. Fix this by providng the same wrapper as all other lockdep entry functions have. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1237544000.24626.52.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-30kexec: Change kexec jump code orderingRafael J. Wysocki1-10/+9
Change the ordering of the kexec jump code so that the nonboot CPUs are disabled after calling device drivers' "late suspend" methods. This change reflects the recent modifications of the power management code that is also used by kexec jump. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30PM: Change hibernation code orderingRafael J. Wysocki1-48/+61
Change the ordering of the hibernation core code so that the platform "prepare" callbacks are executed and the nonboot CPUs are disabled after calling device drivers' "late suspend" methods. This change (along with the previous analogous change of the suspend core code) will allow us to rework the PCI PM core so that the power state of devices is changed in the "late" phase of suspend (and analogously in the "early" phase of resume), which in turn will allow us to avoid the race condition where a device using shared interrupts is put into a low power state with interrupts enabled and then an interrupt (for another device) comes in and confuses its driver. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30PM: Change suspend code orderingRafael J. Wysocki1-16/+22
Change the ordering of the suspend core code so that the platform "prepare" callback is executed and the nonboot CPUs are disabled after calling device drivers' "late suspend" methods. This change will allow us to rework the PCI PM core so that the power state of devices is changed in the "late" phase of suspend (and analogously in the "early" phase of resume), which in turn will allow us to avoid the race condition where a device using shared interrupts is put into a low power state with interrupts enabled and then an interrupt (for another device) comes in and confuses its driver. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30PM: Rework handling of interrupts during suspend-resumeRafael J. Wysocki3-20/+44
Use the functions introduced in by the previous patch, suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(), to rework the handling of interrupts during suspend (hibernation) and resume. Namely, interrupts will only be disabled on the CPU right before suspending sysdevs, while device drivers will be prevented from receiving interrupts, with the help of the new helper function, before their "late" suspend callbacks run (and analogously during resume). In addition, since the device interrups are now disabled before the CPU has turned all interrupts off and the CPU will ACK the interrupts setting the IRQ_PENDING bit for them, check in sysdev_suspend() if any wake-up interrupts are pending and abort suspend if that's the case. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30PM: Introduce functions for suspending and resuming device interruptsRafael J. Wysocki4-7/+106
Introduce helper functions allowing us to prevent device drivers from getting any interrupts (without disabling interrupts on the CPU) during suspend (or hibernation) and to make them start to receive interrupts again during the subsequent resume. These functions make it possible to keep timer interrupts enabled while the "late" suspend and "early" resume callbacks provided by device drivers are being executed. In turn, this allows device drivers' "late" suspend and "early" resume callbacks to sleep, execute ACPI callbacks etc. The functions introduced here will be used to rework the handling of interrupts during suspend (hibernation) and resume. Namely, interrupts will only be disabled on the CPU right before suspending sysdevs, while device drivers will be prevented from receiving interrupts, with the help of the new helper function, before their "late" suspend callbacks run (and analogously during resume). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30cpumask: remove cpumask_t from coreRusty Russell2-2/+2
Impact: cleanup struct cpumask is nicer, and we use it to make where we've made code safe for CONFIG_CPUMASK_OFFSTACK=y. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30cpumask: convert rcutorture.cRusty Russell1-8/+17
We're getting rid of cpumasks on the stack. Simply change tmp_mask to a global, and allocate it in rcu_torture_init(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@freedesktop.org>
2009-03-30cpumask: use new cpumask_ functions in core code.Rusty Russell1-3/+3
Impact: cleanup Time to clean up remaining laggards using the old cpu_ functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Trond.Myklebust@netapp.com
2009-03-30cpumask: use mm_cpumask() wrapper: kernel/fork.cRusty Russell1-1/+1
Impact: futureproof Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30cpumask: use set_cpu_active in init/main.cRusty Russell1-3/+3
cpu_active_map is deprecated in favor of cpu_active_mask, which is const for safety: we use accessors now (set_cpu_active) is we really want to make a change. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALLRusty Russell2-3/+3
Impact: cleanup (Thanks to Al Viro for reminding me of this, via Ingo) CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so: #define CPU_MASK_ALL (cpumask_t) { { ... } } Taking the address of such a temporary is questionable at best, unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added CPU_MASK_ALL_PTR: #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL) Which formalizes this practice. One day gcc could bite us over this usage (though we seem to have gotten away with it so far). So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR with the modern "cpu_all_mask" (a real const struct cpumask *). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Travis <travis@sgi.com>
2009-03-30Merge commit 'origin/master' into nextBenjamin Herrenschmidt36-1056/+2265
Manual merge of: arch/powerpc/include/asm/elf.h drivers/i2c/busses/i2c-mpc.c
2009-03-29sched: fix errors in struct & function commentsRandy Dunlap1-7/+8
Fix kernel-doc errors in sched.c: the structs don't have kernel-doc notation and the short function description needs to be one line only. Error(kernel/sched.c:3197): cannot understand prototype: 'struct sd_lb_stats ' Error(kernel/sched.c:3228): cannot understand prototype: 'struct sg_lb_stats ' Error(kernel/sched.c:3375): duplicate section name 'Description' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-28Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds1-125/+76
* 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: remove the pointer math from double_unlock_hb, fix futex: remove the pointer math from double_unlock_hb futex: clean up fault logic futex: unlock before returning -EFAULT futex: use current->time_slack_ns for rt tasks too futex: add double_unlock_hb() futex: additional (get|put)_futex_key() fixes futex: update futex commentary
2009-03-28Merge branch 'linus' into core/futexesIngo Molnar35-931/+2206
2009-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-for-30Linus Torvalds1-13/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-for-30: fastboot: remove duplicate unpack_to_rootfs() ide/net: flip the order of SATA and network init async: remove the temporary (2.6.29) "async is off by default" code Fix up conflicts in init/initramfs.c manually
2009-03-28async: remove the temporary (2.6.29) "async is off by default" codeArjan van de Ven1-13/+5
Now that everyone has been able to test the async code (and it's being used in the Moblin betas by default), we can enable it by default. The various fixes needed have gone into 2.6.29 already. [With an important bugfix from Stefan Richter] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2009-03-28Merge branch 'linus' into percpu-cpumask-x86-for-linus-2Ingo Molnar1-2/+3
Conflicts: arch/sparc/kernel/time_64.c drivers/gpu/drm/drm_proc.c Manual merge to resolve build warning due to phys_addr_t type change on x86: drivers/gpu/drm/drm_info.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-27Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6Linus Torvalds1-2/+3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...
2009-03-27vfs: simple_set_mnt() should return voidSukadev Bhattiprolu1-1/+2
simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27constify dentry_operations: restAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2Ingo Molnar16-82/+167
Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-26Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds6-106/+158
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits) x86: disable __do_IRQ support sparseirq, powerpc/cell: fix unused variable warning in interrupt.c genirq: deprecate obsolete typedefs and defines genirq: deprecate __do_IRQ genirq: add doc to struct irqaction genirq: use kzalloc instead of explicit zero initialization genirq: make irqreturn_t an enum genirq: remove redundant if condition genirq: remove unused hw_irq_controller typedef irq: export remove_irq() and setup_irq() symbols irq: match remove_irq() args with setup_irq() irq: add remove_irq() for freeing of setup_irq() irqs genirq: assert that irq handlers are indeed running in hardirq context irq: name 'p' variables a bit better irq: further clean up the free_irq() code flow irq: refactor and clean up the free_irq() code flow irq: clean up manage.c irq: use GFP_KERNEL for action allocation in request_irq() kernel/irq: fix sparse warning: make symbol static irq: optimize init_kstat_irqs/init_copy_kstat_irqs ...
2009-03-26Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds5-218/+361
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits) posix timers: fix RLIMIT_CPU && fork() time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix time: ntp: clean up second_overflow() time: ntp: simplify ntp_tick_adj calculations time: ntp: make 64-bit constants more robust time: ntp: refactor do_adjtimex() some more time: ntp: refactor do_adjtimex() time: ntp: fix bug in ntp_update_offset() & do_adjtimex() time: ntp: micro-optimize ntp_update_offset() time: ntp: simplify ntp_update_offset_fll() time: ntp: refactor and clean up ntp_update_offset() time: ntp: refactor up ntp_update_frequency() time: ntp: clean up ntp_update_frequency() time: ntp: simplify the MAX_TICKADJ_SCALED definition time: ntp: simplify the second_overflow() code flow time: ntp: clean up kernel/time/ntp.c x86: hpet: stop HPET_COUNTER when programming periodic mode x86: hpet: provide separate functions to stop and start the counter x86: hpet: print HPET registers during setup (if hpet=verbose is used) time: apply NTP frequency/tick changes immediately ...
2009-03-26Merge branch 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds8-494/+1215
* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits) sched: Add comments to find_busiest_group() function sched: Refactor the power savings balance code sched: Optimize the !power_savings_balance during fbg() sched: Create a helper function to calculate imbalance sched: Create helper to calculate small_imbalance in fbg() sched: Create a helper function to calculate sched_domain stats for fbg() sched: Define structure to store the sched_domain statistics for fbg() sched: Create a helper function to calculate sched_group stats for fbg() sched: Define structure to store the sched_group statistics for fbg() sched: Fix indentations in find_busiest_group() using gotos sched: Simple helper functions for find_busiest_group() sched: remove unused fields from struct rq sched: jiffies not printed per CPU sched: small optimisation of can_migrate_task() sched: fix typos in documentation sched: add avg_overlap decay x86, sched_clock(): mark variables read-mostly sched: optimize ttwu vs group scheduling sched: TIF_NEED_RESCHED -> need_reshed() cleanup sched: don't rebalance if attached on NULL domain ...
2009-03-26Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller6-42/+51
Conflicts: drivers/net/wimax/i2400m/usb-notif.c
2009-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds1-15/+10
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (61 commits) Dynamic debug: fix pr_fmt() build error Dynamic debug: allow simple quoting of words dynamic debug: update docs dynamic debug: combine dprintk and dynamic printk sysfs: fix some bin_vm_ops errors kobject: don't block for each kobject_uevent sysfs: only allow one scheduled removal callback per kobj Driver core: Fix device_move() vs. dpm list ordering, v2 Driver core: some cleanup on drivers/base/sys.c Driver core: implement uevent suppress in kobject vcs: hook sysfs devices into object lifetime instead of "binding" driver core: fix passing platform_data driver core: move platform_data into platform_device sysfs: don't block indefinitely for unmapped files. driver core: move knode_bus into private structure driver core: move knode_driver into private structure driver core: move klist_children into private structure driver core: create a private portion of struct device driver core: remove polling for driver_probe_done(v5) sysfs: reference sysfs_dirent from sysfs inodes ... Fixed conflicts in drivers/sh/maple/maple.c manually
2009-03-26Merge branches 'timers/new-apis', 'timers/ntp' and 'timers/urgent' into timers/coreIngo Molnar4-209/+350
2009-03-26Merge commit 'v2.6.29' into timers/coreIngo Molnar34-202/+468
2009-03-25sched: Add comments to find_busiest_group() functionGautham R Shenoy1-8/+42
Impact: cleanup Add /** style comments around find_busiest_group(). Also add a few explanatory comments. This concludes the find_busiest_group() cleanup. The function is now down to 72 lines from the original 313 lines. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25sched: Refactor the power savings balance codeGautham R Shenoy1-83/+153
Impact: cleanup Create seperate helper functions to initialize the power-savings-balance related variables, to update them and to check if we have a scope for performing power-savings balance. Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT) case. This will eliminate all the #ifdef jungle in find_busiest_group() and the other helper functions. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25sched: Optimize the !power_savings_balance during fbg()Gautham R Shenoy1-9/+14
Impact: cleanup, micro-optimization We don't need to perform power_savings balance if either the cpu is NOT_IDLE or if the sched_domain doesn't contain the SD_POWERSAVINGS_BALANCE flag set. Currently, we check for these conditions multiple number of times, even though these variables don't change over the scope of find_busiest_group(). Check once, and store the value in the already exiting "power_savings_balance" variable. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25sched: Create a helper function to calculate imbalanceGautham R Shenoy1-33/+45
Move all the imbalance calculation out of find_busiest_group() through this helper function. With this change, the structure of find_busiest_group() will be as follows: - update_sched_domain_statistics. - check if imbalance exits. - update imbalance and return busiest. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>