aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-12-11Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds21-359/+451
Pull perf updates from Ingo Molnar: "Lots of activity: 211 files changed, 8328 insertions(+), 4116 deletions(-) most of it on the tooling side. Main changes: * ftrace enhancements and fixes from Steve Rostedt. * uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition * Separate perf tests into multiple objects, one per test, from Jiri Olsa. * Make hardware event translations available in sysfs, from Jiri Olsa. * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps, from Namhyung Kim * Implement ui_progress for GTK, from Namhyung Kim * Add framework for automated perf_event_attr tests, where tools with different command line options will be run from a 'perf test', via python glue, and the perf syscall will be intercepted to verify that the perf_event_attr fields set by the tool are those expected, from Jiri Olsa * Add a 'link' method for hists, so that we can have the leader with buckets for all the entries in all the hists. This new method is now used in the default 'diff' output, making the sum of the 'baseline' column be 100%, eliminating blind spots. * libtraceevent fixes for compiler warnings trying to make perf it build on some distros, like fedora 14, 32-bit, some of the warnings really pointed to real bugs. * Add a browser for 'perf script' and make it available from the report and annotate browsers. It does filtering to find the scripts that handle events found in the perf.data file used. From Feng Tang * perf inject changes to allow showing where a task sleeps, from Andrew Vagin. * Makefile improvements from Namhyung Kim. * Add --pre and --post command hooks in 'stat', from Peter Zijlstra. * Don't stop synthesizing threads when one vanishes, this is for the existing threads when we start a tool like trace. * Use sched:sched_stat_runtime to provide a thread summary, this produces the same output as the 'trace summary' subcommand of tglx's original "trace" tool. * Support interrupted syscalls in 'trace' * Add an event duration column and filter in 'trace'. * There are references to the man pages in some tools, so try to build Documentation when installing, warning the user if that is not possible, from Borislav Petkov. * Give user better message if precise is not supported, from David Ahern. * Try to find cross-built objdump path by using the session environment information in the perf.data file header, from Irina Tirdea, original patch and idea by Namhyung Kim. * Diplays more output on features check for make V=1, so that one can figure out what is happening by looking at gcc output, etc. From Jiri Olsa. * Add on_exit implementation for systems without one, e.g. Android, from Bernhard Rosenkraenzer. * Only process events for vcpus of interest, helps handling large number of events, from David Ahern. * Cross compilation fixes for Android, from Irina Tirdea. * Add documentation on compiling for Android, from Irina Tirdea. * perf diff improvements from Jiri Olsa. * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim. * Add support in 'trace' for tracing workload given by command line, from Namhyung Kim. * ... and much more." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits) uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race perf evsel: Introduce is_group_member method perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing perf ui: Always compile browser setup code perf ui: Add ui_progress__finish() perf ui gtk: Implement ui_progress functions perf ui: Introduce generic ui_progress helper perf ui tui: Move progress.c under ui/tui directory perf tools: Add basic event modifier sanity check perf tools: Omit group members from perf_evlist__disable/enable perf tools: Ensure single disable call per event in record comand perf tools: Fix 'disabled' attribute config for record command perf tools: Fix attributes for '{}' defined event groups perf tools: Use sscanf for parsing /proc/pid/maps perf tools: Add gtk.<command> config option for launching GTK browser perf tools: Fix compile error on NO_NEWT=1 build perf hists: Initialize all of he->stat with zeroes ...
2012-12-11Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-2/+48
Pull irq fixes from Ingo Molnar: "Affinity fixes and a nested threaded IRQ handling fix." * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Always force thread affinity irq: Set CPU affinity right on thread creation genirq: Provide means to retrigger parent
2012-12-11Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-352/+1014
Pull RCU update from Ingo Molnar: "The major features of this tree are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) context_tracking: New context tracking susbsystem sched: Mark RCU reader in sched_show_task() rcu: Separate accounting of callbacks from callback-free CPUs rcu: Add callback-free CPUs rcu: Add documentation for the new rcuexp debugfs trace file rcu: Update documentation for TREE_RCU debugfs tracing rcu: Reduce default RCU CPU stall warning timeout rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check rcu: Clarify memory-ordering properties of grace-period primitives rcu: Add new rcutorture module parameters to start/end test messages rcu: Remove list_for_each_continue_rcu() rcu: Fix batch-limit size problem rcu: Add tracing for synchronize_sched_expedited() rcu: Remove old debugfs interfaces and also RCU flavor name rcu: split 'rcuhier' to each flavor rcu: split 'rcugp' to each flavor rcu: split 'rcuboost' to each flavor rcu: split 'rcubarrier' to each flavor rcu: Fix tracing formatting rcu: Remove the interface "rcudata.csv" ...
2012-12-11Merge branches 'core-locking-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+2
Pull trivial fix branches from Ingo Molnar. Cleanup in __get_key_name, and a timer comment fixlet. * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers, sched: Correct the comments for tick_sched_timer()
2012-12-11Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds2-2/+12
Pull TTY/Serial merge from Greg Kroah-Hartman: "Here's the big tty/serial tree set of changes for 3.8-rc1. Contained in here is a bunch more reworks of the tty port layer from Jiri and bugfixes from Alan, along with a number of other tty and serial driver updates by the various driver authors. Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY layer, which is much appreciated by me. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up some trivial conflicts in the staging tree, due to the fwserial driver having come in both ways (but fixed up a bit in the serial tree), and the ioctl handling in the dgrp driver having been done slightly differently (staging tree got that one right, and removed both TIOCGSOFTCAR and TIOCSSOFTCAR). * tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits) staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer() staging/fwserial: Remove superfluous free staging/fwserial: Use WARN_ONCE when port table is corrupted staging/fwserial: Destruct embedded tty_port on teardown staging/fwserial: Fix build breakage when !CONFIG_BUG staging: fwserial: Add TTY-over-Firewire serial driver drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user() staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver serial: ifx6x60: Add modem power off function in the platform reboot process serial: mxs-auart: unmap the scatter list before we copy the data serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled serial: max310x: Setup missing "can_sleep" field for GPIO tty/serial: fix ifx6x60.c declaration warning serial: samsung: add devicetree properties for non-Exynos SoCs serial: samsung: fix potential soft lockup during uart write tty: vt: Remove redundant null check before kfree. tty/8250 Add check for pci_ioremap_bar failure tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards tty/8250 Add XR17D15x devices to the exar_handle_irq override ...
2012-12-11Merge tag 'driver-core-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds2-6/+3
Pull driver core updates from Greg Kroah-Hartman: "Here's the large driver core updates for 3.8-rc1. The biggest thing here is the various __dev* marking removals. This is going to be a pain for the merge with different subsystem trees, I know, but all of the patches included here have been ACKed by their various subsystem maintainers, as they wanted them to go through here. If this is too much of a pain, I can pull all of them out of this tree and just send you one with the other fixes/updates and then, after 3.8-rc1 is out, do the rest of the removals to ensure we catch them all, it's up to you. The merges should all be trivial, and Stephen has been doing them all in linux-next for a few weeks now quite easily. Other than the __dev* marking removals, there's nothing major here, some firmware loading updates and other minor things in the driver core. All of these have (much to Stephen's annoyance), been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up trivial conflicts in drivers/gpio/gpio-{em,stmpe}.c due to gpio update. * tag 'driver-core-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (93 commits) modpost.c: Stop checking __dev* section mismatches init.h: Remove __dev* sections from the kernel acpi: remove use of __devinit PCI: Remove __dev* markings PCI: Always build setup-bus when PCI is enabled PCI: Move pci_uevent into pci-driver.c PCI: Remove CONFIG_HOTPLUG ifdefs unicore32/PCI: Remove CONFIG_HOTPLUG ifdefs sh/PCI: Remove CONFIG_HOTPLUG ifdefs powerpc/PCI: Remove CONFIG_HOTPLUG ifdefs mips/PCI: Remove CONFIG_HOTPLUG ifdefs microblaze/PCI: Remove CONFIG_HOTPLUG ifdefs dma: remove use of __devinit dma: remove use of __devexit_p firewire: remove use of __devinitdata firewire: remove use of __devinit leds: remove use of __devexit leds: remove use of __devinit leds: remove use of __devexit_p mmc: remove use of __devexit ...
2012-12-11Merge tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds5-6/+75
Pull ACPI and power management updates from Rafael Wysocki: - Introduction of device PM QoS flags. - ACPI device power management update allowing subsystems other than PCI to use it more easily. - ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. - ACPICA update to version 20121018 from Bob Moore and Lv Zheng. - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. - Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. - ACPI EC updates from Feng Tang. - cpufreq updates from Viresh Kumar, Fabio Baltieri and others. - cpuidle changes to quickly notice governor prediction failure from Youquan Song. - Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. - devfreq updates from Nishanth Menon and others. - cpupower update from Thomas Renninger. - Fixes and small cleanups all over the place. * tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits) mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6 ACPI: add Haswell LPSS devices to acpi_platform_device_ids list ACPI: add documentation about ACPI 5 enumeration pnpacpi: fix incorrect TEST_ALPHA() test ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000 ACPI : do not use Lid and Sleep button for S5 wakeup ACPI / PNP: Do not crash due to stale pointer use during system resume ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set spi / ACPI: add ACPI enumeration support gpio / ACPI: add ACPI support PM / devfreq: remove compiler error with module governors (2) cpupower: IvyBridge (0x3a and 0x3e models) support cpupower: Provide -c param for cpupower monitor to schedule process on all cores cpupower tools: Fix warning and a bug with the cpu package count cpupower tools: Fix malloc of cpu_info structure cpupower tools: Fix issues with sysfs_topology_read_file cpupower tools: Fix minor warnings cpupower tools: Update .gitignore for files created in the debug directories ...
2012-12-11Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds1-2/+2
Pull irqdomain changes from Grant Likely: "Trivial changes to irqdomain. An update to the documentation and make one of the error paths not quite so obnoxious." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: irqdomain: update documentation irqdomain: stop screaming about preallocated irqdescs
2012-12-11mm: sched: numa: Delay PTE scanning until a task is scheduled on a new nodeMel Gorman3-1/+24
Due to the fact that migrations are driven by the CPU a task is running on there is no point tracking NUMA faults until one task runs on a new node. This patch tracks the first node used by an address space. Until it changes, PTE scanning is disabled and no NUMA hinting faults are trapped. This should help workloads that are short-lived, do not care about NUMA placement or have bound themselves to a single node. This takes advantage of the logic in "mm: sched: numa: Implement slow start for working set sampling" to delay when the checks are made. This will take advantage of processes that set their CPU and node bindings early in their lifetime. It will also potentially allow any initial load balancing to take place. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUGMel Gorman2-1/+16
The "mm: sched: numa: Control enabling and disabling of NUMA balancing" depends on scheduling debug being enabled but it's perfectly legimate to disable automatic NUMA balancing even without this option. This should take care of it. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: sched: numa: Control enabling and disabling of NUMA balancingMel Gorman3-17/+40
This patch adds Kconfig options and kernel parameters to allow the enabling and disabling of automatic NUMA balancing. The existance of such a switch was and is very important when debugging problems related to transparent hugepages and we should have the same for automatic NUMA placement. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrateMel Gorman3-8/+29
The PTE scanning rate and fault rates are two of the biggest sources of system CPU overhead with automatic NUMA placement. Ideally a proper policy would detect if a workload was properly placed, schedule and adjust the PTE scanning rate accordingly. We do not track the necessary information to do that but we at least know if we migrated or not. This patch scans slower if a page was not migrated as the result of a NUMA hinting fault up to sysctl_numa_balancing_scan_period_max which is now higher than the previous default. Once every minute it will reset the scanner in case of phase changes. This is hilariously crude and the numbers are arbitrary. Workloads will converge quite slowly in comparison to what a proper policy should be able to do. On the plus side, we will chew up less CPU for workloads that have no need for automatic balancing. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11sched: numa: Slowly increase the scanning period as NUMA faults are handledMel Gorman1-1/+10
Currently the rate of scanning for an address space is controlled by the individual tasks. The next scan is simply determined by 2*p->numa_scan_period. The 2*p->numa_scan_period is arbitrary and never changes. At this point there is still no proper policy that decides if a task or process is properly placed. It just scans and assumes the next NUMA fault will place it properly. As it is assumed that pages will get properly placed over time, increase the scan window each time a fault is incurred. This is a big assumption as noted in the comments. It should be noted that changing to p->numa_scan_period will increase system CPU usage because now the scanning rate has effectively doubled. If that is a problem then the min_rate should be made 200ms instead of restoring the 2* logic. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: numa: Rate limit setting of pte_numa if node is saturatedMel Gorman1-0/+9
If there are a large number of NUMA hinting faults and all of them are resulting in migrations it may indicate that memory is just bouncing uselessly around. NUMA balancing cost is likely exceeding any benefit from locality. Rate limit the PTE updates if the node is migration rate-limited. As noted in the comments, this distorts the NUMA faulting statistics. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: sched: numa: Implement slow start for working set samplingPeter Zijlstra3-1/+13
Add a 1 second delay before starting to scan the working set of a task and starting to balance it amongst nodes. [ note that before the constant per task WSS sampling rate patch the initial scan would happen much later still, in effect that patch caused this regression. ] The theory is that short-run tasks benefit very little from NUMA placement: they come and go, and they better stick to the node they were started on. As tasks mature and rebalance to other CPUs and nodes, so does their NUMA placement have to change and so does it start to matter more and more. In practice this change fixes an observable kbuild regression: # [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ] !NUMA: 45.291088843 seconds time elapsed ( +- 0.40% ) 45.154231752 seconds time elapsed ( +- 0.36% ) +NUMA, no slow start: 46.172308123 seconds time elapsed ( +- 0.30% ) 46.343168745 seconds time elapsed ( +- 0.25% ) +NUMA, 1 sec slow start: 45.224189155 seconds time elapsed ( +- 0.25% ) 45.160866532 seconds time elapsed ( +- 0.17% ) and it also fixes an observable perf bench (hackbench) regression: # perf stat --null --repeat 10 perf bench sched messaging -NUMA: -NUMA: 0.246225691 seconds time elapsed ( +- 1.31% ) +NUMA no slow start: 0.252620063 seconds time elapsed ( +- 1.13% ) +NUMA 1sec delay: 0.248076230 seconds time elapsed ( +- 1.35% ) The implementation is simple and straightforward, most of the patch deals with adding the /proc/sys/kernel/numa_balancing_scan_delay_ms tunable knob. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> [ Wrote the changelog, ran measurements, tuned the default. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>
2012-12-11sched, numa, mm: Count WS scanning against present PTEs, not virtual memory rangesMel Gorman1-15/+21
By accounting against the present PTEs, scanning speed reflects the actual present (mapped) memory. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) ratePeter Zijlstra2-13/+59
Previously, to probe the working set of a task, we'd use a very simple and crude method: mark all of its address space PROT_NONE. That method has various (obvious) disadvantages: - it samples the working set at dissimilar rates, giving some tasks a sampling quality advantage over others. - creates performance problems for tasks with very large working sets - over-samples processes with large address spaces but which only very rarely execute Improve that method by keeping a rotating offset into the address space that marks the current position of the scan, and advance it by a constant rate (in a CPU cycles execution proportional manner). If the offset reaches the last mapped address of the mm then it then it starts over at the first address. The per-task nature of the working set sampling functionality in this tree allows such constant rate, per task, execution-weight proportional sampling of the working set, with an adaptive sampling interval/frequency that goes from once per 100ms up to just once per 8 seconds. The current sampling volume is 256 MB per interval. As tasks mature and converge their working set, so does the sampling rate slow down to just a trickle, 256 MB per 8 seconds of CPU time executed. This, beyond being adaptive, also rate-limits rarely executing systems and does not over-sample on overloaded systems. [ In AutoNUMA speak, this patch deals with the effective sampling rate of the 'hinting page fault'. AutoNUMA's scanning is currently rate-limited, but it is also fundamentally single-threaded, executing in the knuma_scand kernel thread, so the limit in AutoNUMA is global and does not scale up with the number of CPUs, nor does it scan tasks in an execution proportional manner. So the idea of rate-limiting the scanning was first implemented in the AutoNUMA tree via a global rate limit. This patch goes beyond that by implementing an execution rate proportional working set sampling rate that is not implemented via a single global scanning daemon. ] [ Dan Carpenter pointed out a possible NULL pointer dereference in the first version of this patch. ] Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com> Bug-Found-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> [ Wrote changelog and fixed bug. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>
2012-12-11mm: numa: Add fault driven placement and migrationPeter Zijlstra5-2/+173
NOTE: This patch is based on "sched, numa, mm: Add fault driven placement and migration policy" but as it throws away all the policy to just leave a basic foundation I had to drop the signed-offs-by. This patch creates a bare-bones method for setting PTEs pte_numa in the context of the scheduler that when faulted later will be faulted onto the node the CPU is running on. In itself this does nothing useful but any placement policy will fundamentally depend on receiving hints on placement from fault context and doing something intelligent about it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>
2012-12-11Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"Ingo Molnar3-14/+69
This reverts commit 5258f386ea4e8454bc801fb443e8a4217da1947c, because the underlying autogroups bug got fixed upstream in a better way, via: fd8ef11730f1 Revert "sched, autogroup: Stop going ahead if autogroup is disabled" Cc: Mike Galbraith <efault@gmx.de> Cc: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/coreIngo Molnar3-38/+83
Pull ftrace updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/coreIngo Molnar3-16/+31
Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge tag 'sched-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/coreIngo Molnar1-12/+19
Pull more cputime cleanups from Frederic Weisbecker: * Get rid of underscores polluting the vtime namespace * Consolidate context switch and tick handling * Improve debuggability by detecting irq unsafe callers Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge tag 'cputime-adjustment-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/coreIngo Molnar10-119/+96
Pull cputime cleanups from Frederic Weisbecker: * Improve naming and code location * Consolidate adjustment code * Comment the adjustement code Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'linus' into perf/coreIngo Molnar12-99/+116
Conflicts: tools/perf/Makefile tools/perf/builtin-test.c tools/perf/perf.h tools/perf/tests/parse-events.c tools/perf/util/evsel.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-07Merge branch 'linus' into sched/coreIngo Molnar12-99/+116
Pick up the autogroups fix and other fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-06cgroup_rm_file: don't delete the uncreated filesGao feng1-6/+6
in cgroup_add_file,when creating files for cgroup, some of creation may be skipped. So we need to avoid deleting these uncreated files in cgroup_rm_file, otherwise the warning msg will be triggered. "cgroup_addrm_files: failed to remove memory_pressure_enabled, err=-2" Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@redhat.com> Cc: stable@vger.kernel.org
2012-12-06Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds1-7/+7
Pull module signing fixes from Rusty Russell: "David gave me these a month ago, during my git workflow churn :(" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: ASN.1: Fix an indefinite length skip error MODSIGN: Don't use enum-type bitfields in module signature info block
2012-12-06Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+3
Pull watchdog fix from Thomas Gleixner: "Trivial CPU hotplug regression fix for the watchdog code" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog: Fix CPU hotplug regression
2012-12-06propagate name change to comments in kernel sourceNadia Yvette Chambers8-11/+12
I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06padata: use __this_cpu_read per-cpu helperShan Wei1-3/+2
For bottom halves off, __this_cpu_read is better. Signed-off-by: Shan Wei <davidshan@tencent.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-12-05MODSIGN: Don't use enum-type bitfields in module signature info blockDavid Howells1-7/+7
Don't use enum-type bitfields in the module signature info block as we can't be certain how the compiler will handle them. As I understand it, it is arch dependent, and it is possible for the compiler to rearrange them based on endianness and to insert a byte of padding to pad the three enums out to four bytes. Instead use u8 fields for these, which the compiler should emit in the right order without padding. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-04watchdog: Fix CPU hotplug regressionThomas Gleixner1-0/+3
Norbert reported: "3.7-rc6 booted with nmi_watchdog=0 fails to suspend to RAM or offline CPUs. It's reproducable with a KVM guest and physical system." The reason is that commit bcd951cf(watchdog: Use hotplug thread infrastructure) missed to take this into account. So the cpu offline code gets stuck in the teardown function because it accesses non initialized data structures. Add a check for watchdog_enabled into that path to cure the issue. Reported-and-tested-by: Norbert Warmuth <nwarmuth@t-online.de> Tested-by: Joseph Salisbury <joseph.salisbury@canonical.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1211231033230.2701@ionos Link: http://bugs.launchpad.net/bugs/1079534 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-12-04Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds1-2/+2
Pull module fixes from Rusty Russell: "Module signing build fixes for blackfin and metag" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: modsign: add symbol prefix to certificate list linux/kernel.h: define SYMBOL_PREFIX
2012-12-04Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-2/+2
Pull workqueue fixes from Tejun Heo: "So, safe fixes my ass. Commit 8852aac25e79 ("workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay") had the side-effect of performing delayed_work sanity checks even when @delay is 0, which should be fine for any sane use cases. Unfortunately, megaraid was being overly ingenious. It seemingly wanted to use cancel_delayed_work_sync() before cancel_work_sync() was introduced, but didn't want to waste the space for full delayed_work as it was only going to use 0 @delay. So, it only allocated space for struct work_struct and then cast it to struct delayed_work and passed it into delayed_work functions - truly awesome engineering tradeoff to save some bytes. Xiaotian fixed it by making megraid allocate full delayed_work for now. It should be converted to use work_struct and cancel_work_sync() but I think we better do that after 3.7. I added another commit to change BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that the kernel doesn't crash even if there are more such abuses." * 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s megaraid: fix BUG_ON() from incorrect use of delayed work
2012-12-04workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()sTejun Heo1-2/+2
8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in megaraid - it allocated work_struct, casted it to delayed_work and then pass that into queue_delayed_work(). Previously, this was okay because 0 @delay short-circuited to queue_work() before doing anything with delayed_work. 8852aac25e moved 0 @delay test into __queue_delayed_work() after sanity check on delayed_work making megaraid trigger BUG_ON(). Although megaraid is already fixed by c1d390d8e6 ("megaraid: fix BUG_ON() from incorrect use of delayed work"), this patch converts BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such abusers, if there are more, trigger warning but don't crash the machine. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Xiaotian Feng <xtfeng@gmail.com>
2012-12-03Revert "sched, autogroup: Stop going ahead if autogroup is disabled"Mike Galbraith2-9/+0
This reverts commit 800d4d30c8f20bd728e5741a3b77c4859a613f7c. Between commits 8323f26ce342 ("sched: Fix race in task_group()") and 800d4d30c8f2 ("sched, autogroup: Stop going ahead if autogroup is disabled"), autogroup is a wreck. With both applied, all you have to do to crash a box is disable autogroup during boot up, then reboot.. boom, NULL pointer dereference due to commit 800d4d30c8f2 not allowing autogroup to move things, and commit 8323f26ce342 making that the only way to switch runqueues: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81063ac0>] effective_load.isra.43+0x50/0x90 Pid: 7047, comm: systemd-user-se Not tainted 3.6.8-smp #7 MEDIONPC MS-7502/MS-7502 RIP: effective_load.isra.43+0x50/0x90 Process systemd-user-se (pid: 7047, threadinfo ffff880221dde000, task ffff88022618b3a0) Call Trace: select_task_rq_fair+0x255/0x780 try_to_wake_up+0x156/0x2c0 wake_up_state+0xb/0x10 signal_wake_up+0x28/0x40 complete_signal+0x1d6/0x250 __send_signal+0x170/0x310 send_signal+0x40/0x80 do_send_sig_info+0x47/0x90 group_send_sig_info+0x4a/0x70 kill_pid_info+0x3a/0x60 sys_kill+0x97/0x1a0 ? vfs_read+0x120/0x160 ? sys_read+0x45/0x90 system_call_fastpath+0x16/0x1b Code: 49 0f af 41 50 31 d2 49 f7 f0 48 83 f8 01 48 0f 46 c6 48 2b 07 48 8b bf 40 01 00 00 48 85 ff 74 3a 45 31 c0 48 8b 8f 50 01 00 00 <48> 8b 11 4c 8b 89 80 00 00 00 49 89 d2 48 01 d0 45 8b 59 58 4c RIP [<ffffffff81063ac0>] effective_load.isra.43+0x50/0x90 RSP <ffff880221ddfbd8> CR2: 0000000000000000 Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-03cgroup: remove subsystem files when remounting cgroupGao feng1-2/+9
cgroup_clear_directroy is called by cgroup_d_remove_dir and cgroup_remount. when we call cgroup_remount to remount the cgroup,the subsystem may be unlinked from cgroupfs_root->subsys_list in rebind_subsystem,this subsystem's files will not be removed in cgroup_clear_directroy. And the system will panic when we try to access these files. this patch removes subsystems's files before rebind_subsystems, if rebind_subsystems failed, repopulate these removed files. With help from Tejun. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-12-03Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar14-352/+1014
Conflicts: arch/x86/kernel/ptrace.c Pull the latest RCU tree from Paul E. McKenney: " The major features of this series are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341, and are at branch rcu/tracing. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315, and are at branch rcu/stall. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309, along with a late-breaking change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID <20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org seems to have missed. These are at branch rcu/fixes. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-03modsign: add symbol prefix to certificate listJames Hogan1-2/+2
Add the arch symbol prefix (if applicable) to the asm definition of modsign_certificate_list and modsign_certificate_list_end. This uses the recently defined SYMBOL_PREFIX which is derived from CONFIG_SYMBOL_PREFIX. This fixes the build of module signing on the blackfin and metag architectures. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-01workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up()Joonsoo Kim1-1/+3
Recently, workqueue code has gone through some changes and we found some bugs related to concurrency management operations happening on the wrong CPU. When a worker is concurrency managed (!WORKER_NOT_RUNNIG), it should be bound to its associated cpu and woken up to that cpu. Add WARN_ON_ONCE() to verify this. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-12-01workqueue: trivial fix for return statement in work_busy()Joonsoo Kim1-1/+1
Return type of work_busy() is unsigned int. There is return statement returning boolean value, 'false' in work_busy(). It is not problem, because 'false' may be treated '0'. However, fixing it would make code robust. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-12-01workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delayTejun Heo1-3/+11
8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()") implemented mod_delayed_work[_on]() using the improved try_to_grab_pending(). The function is later used, among others, to replace [__]candel_delayed_work() + queue_delayed_work() combinations. Unfortunately, a delayed_work item w/ zero @delay is handled slightly differently by mod_delayed_work_on() compared to queue_delayed_work_on(). The latter skips timer altogether and directly queues it using queue_work_on() while the former schedules timer which will expire on the closest tick. This means, when @delay is zero, that [__]cancel_delayed_work() + queue_delayed_work_on() makes the target item immediately executable while mod_delayed_work_on() may induce delay of upto a full tick. This somewhat subtle difference breaks some of the converted users. e.g. block queue plugging uses delayed_work for deferred processing and uses mod_delayed_work_on() when the queue needs to be immediately unplugged. The above problem manifested as noticeably higher number of context switches under certain circumstances. The difference in behavior was caused by missing special case handling for 0 delay in mod_delayed_work_on() compared to queue_delayed_work_on(). Joonsoo Kim posted a patch to add it - ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1]. The patch was queued for 3.8 but it was described as optimization and I missed that it was a correctness issue. As both queue_delayed_work_on() and mod_delayed_work_on() use __queue_delayed_work() for queueing, it seems that the better approach is to move the 0 delay special handling to the function instead of duplicating it in mod_delayed_work_on(). Fix the problem by moving 0 delay special case handling from queue_delayed_work_on() to __queue_delayed_work(). This replaces Joonsoo's patch. [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Anders Kaseorg <andersk@MIT.EDU> Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr> LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu> LKML-Reference: <50A78AA9.5040904@iskon.hr> Cc: Joonsoo Kim <js1304@gmail.com>
2012-12-01workqueue: exit rescuer_thread() as TASK_RUNNINGMike Galbraith1-1/+3
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling off, never to be seen again. In the case where this occurred, an exiting thread hit reiserfs homebrew conditional resched while holding a mutex, bringing the box to its knees. PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush" #0 [ffff8808157e7670] schedule at ffffffff8143f489 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs] #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs] #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f [exception RIP: kernel_thread_helper] RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2012-12-01Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-5/+7
Pull perf fixes from Ingo Molnar: "This is mostly about unbreaking architectures that took the UAPI changes in the v3.7 cycle, plus misc fixes." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix building perf kvm on non x86 arches perf kvm: Rename perf_kvm to perf_kvm_stat perf: Make perf build for x86 with UAPI disintegration applied perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing x86: Export asm/{svm.h,vmx.h,perf_regs.h} perf tools: Fix strbuf_addf() when the buffer needs to grow perf header: Fix numa topology printing perf, powerpc: Fix hw breakpoints returning -ENOSPC
2012-11-30cgroup: use cgroup_addrm_files() in cgroup_clear_directory()Gao feng1-1/+3
cgroup_clear_directory() incorrectly invokes cgroup_rm_file() on each cftset of the target subsystems, which only removes the first file of each set. This leaves dangling files after subsystems are removed from a cgroup root via remount. Use cgroup_addrm_files() to remove all files of target subsystems. tj: Move cgroup_addrm_files() prototype decl upwards next to other global declarations. Commit message updated. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-30context_tracking: New context tracking susbsystemFrederic Weisbecker4-67/+92
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-30ring-buffer: Fix race between integrity check and readersSteven Rostedt1-1/+6
The function rb_check_pages() was added to make sure the ring buffer's pages were sane. This check is done when the ring buffer size is modified as well as when the iterator is released (closing the "trace" file), as that was considered a non fast path and a good place to do a sanity check. The problem is that the check does not have any locks around it. If one process were to read the trace file, and another were to read the raw binary file, the check could happen while the reader is reading the file. The issues with this is that the check requires to clear the HEAD page before doing the full check and it restores it afterward. But readers require the HEAD page to exist before it can read the buffer, otherwise it gives a nasty warning and disables the buffer. By adding the reader lock around the check, this keeps the race from happening. Cc: stable@vger.kernel.org # 3.6 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-30ring-buffer: Fix NULL pointer if rb_set_head_page() failsSteven Rostedt1-2/+7
The function rb_set_head_page() searches the list of ring buffer pages for a the page that has the HEAD page flag set. If it does not find it, it will do a WARN_ON(), disable the ring buffer and return NULL, as this should never happen. But if this bug happens to happen, not all callers of this function can handle a NULL pointer being returned from it. That needs to be fixed. Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-30cgroup: warn about broken hierarchies only after css_onlineGlauber Costa1-9/+9
If everything goes right, it shouldn't really matter if we are spitting this warning after css_alloc or css_online. If we fail between then, there are some ill cases where we would previously see the message and now we won't (like if the files fail to be created). I believe it really shouldn't matter: this message is intended in spirit to be shown when creation succeeds, but with insane settings. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-30irqdomain: stop screaming about preallocated irqdescsLinus Walleij1-2/+2
In the simple irqdomain: don't shout warnings to the user, there is no point. An informational print is sufficient. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>