aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-01-17audit: remove audit_finish_fork as it can't be calledEric Paris2-22/+0
Audit entry,always rules are not allowed and are automatically changed in exit,always rules in userspace. The kernel refuses to load such rules. Thus a task in the middle of a syscall (and thus in audit_finish_fork()) can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED. Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every going to actually use the code in audit_finish_fork() since it will return without doing anything. Thus drop the code. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: reject entry,always rulesEric Paris1-2/+4
We deprecated entry,always rules a long time ago. Reject those rules as invalid. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: inline audit_free to simplify the look of generic codeEric Paris2-3/+2
make the conditional a static inline instead of doing it in generic code. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: inline checks for not needing to collect aux recordsEric Paris1-12/+3
A number of audit hooks make function calls before they determine that auxilary records do not need to be collected. Do those checks as static inlines since the most common case is going to be that records are not needed and we can skip the function call overhead. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: drop some potentially inadvisable likely notationsEric Paris1-6/+6
The audit code makes heavy use of likely() and unlikely() macros, but they don't always make sense. Drop any that seem questionable and let the computer do it's thing. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: remove AUDIT_SETUP_CONTEXT as it isn't usedEric Paris1-5/+1
Audit contexts have 3 states. Disabled, which doesn't collect anything, build, which collects info but might not emit it, and record, which collects and emits. There is a 4th state, setup, which isn't used. Get rid of it. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: inline audit_syscall_entry to reduce burden on archsEric Paris1-1/+1
Every arch calls: if (unlikely(current->audit_context)) audit_syscall_entry() which requires knowledge about audit (the existance of audit_context) in the arch code. Just do it all in static inline in audit.h so that arch's can remain blissfully ignorant. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17Audit: push audit success and retcode into arch ptrace.hEric Paris1-4/+12
The audit system previously expected arches calling to audit_syscall_exit to supply as arguments if the syscall was a success and what the return code was. Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things by converting from negative retcodes to an audit internal magic value stating success or failure. This helper was wrong and could indicate that a valid pointer returned to userspace was a failed syscall. The fix is to fix the layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it in turns calls back into arch code to collect the return value and to determine if the syscall was a success or failure. We also define a generic is_syscall_success() macro which determines success/failure based on if the value is < -MAX_ERRNO. This works for arches like x86 which do not use a separate mechanism to indicate syscall failure. We make both the is_syscall_success() and regs_return_value() static inlines instead of macros. The reason is because the audit function must take a void* for the regs. (uml calls theirs struct uml_pt_regs instead of just struct pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit function takes a void* we need to use static inlines to cast it back to the arch correct structure to dereference it. The other major change is that on some arches, like ia64, MIPS and ppc, we change regs_return_value() to give us the negative value on syscall failure. THE only other user of this macro, kretprobe_example.c, won't notice and it makes the value signed consistently for the audit functions across all archs. In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old audit code as the return value. But the ptrace_64.h code defined the macro regs_return_value() as regs[3]. I have no idea which one is correct, but this patch now uses the regs_return_value() function, so it now uses regs[3]. For powerpc we previously used regs->result but now use the regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is always positive so the regs_return_value(), much like ia64 makes it negative before calling the audit code when appropriate. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion] Acked-by: Tony Luck <tony.luck@intel.com> [for ia64] Acked-by: Richard Weinberger <richard@nod.at> [for uml] Acked-by: David S. Miller <davem@davemloft.net> [for sparc] Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
2012-01-17seccomp: audit abnormal end to a process due to seccompEric Paris2-21/+31
The audit system likes to collect information about processes that end abnormally (SIGSEGV) as this may me useful intrusion detection information. This patch adds audit support to collect information when seccomp forces a task to exit because of misbehavior in a similar way. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: check current inode and containing object when filtering on major and minorEric Paris1-10/+14
The audit system has the ability to filter on the major and minor number of the device containing the inode being operated upon. Lets say that /dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot. Now lets say we add a watch with a filter on 8,1. If we proceed to open an inode inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter. Lets instead assume that one were to use a tool like debugfs and were to open /dev/sda1 directly and to modify it's contents. We might hope that this would also be logged, but it isn't. The rules will check the major,minor of the device containing /dev/sda1. In other words the rule would match on the major/minor of the tmpfs mounted at /dev. I believe these rules should trigger on either device. The man page is devoid of useful information about the intended semantics. It only seems logical that if you want to know everything that happened on a major,minor that would include things that happened to the device itself... Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: drop the meaningless and format breaking word 'user'Eric Paris1-1/+1
userspace audit messages look like so: type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg='' That third field just says 'user'. That's useless and doesn't follow the key=value pair we are trying to enforce. We already know it came from the user based on the record type. Kill that word. Die. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: dynamically allocate audit_names when not enough space is in the names arrayEric Paris1-188/+215
This patch does 2 things. First it reduces the number of audit_names allocated in every audit context from 20 to 5. 5 should be enough for all 'normal' syscalls (rename being the worst). Some syscalls can still touch more the 5 inodes such as mount. When rpc filesystem is mounted it will create inodes and those can exceed 5. To handle that problem this patch will dynamically allocate audit_names if it needs more than 5. This should decrease the typicall memory usage while still supporting all the possible kernel operations. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: make filetype matching consistent with other filtersEric Paris2-12/+11
Every other filter that matches part of the inodes list collected by audit will match against any of the inodes on that list. The filetype matching however had a strange way of doing things. It allowed userspace to indicated if it should match on the first of the second name collected by the kernel. Name collection ordering seems like a kernel internal and making userspace rules get that right just seems like a bad idea. As it turns out the userspace audit writers had no idea it was doing this and thus never overloaded the value field. The kernel always checked the first name collected which for the tested rules was always correct. This patch just makes the filetype matching like the major, minor, inode, and LSM rules in that it will match against any of the names collected. It also changes the rule validation to reject the old unused rule types. Noone knew it was there. Noone used it. Why keep around the extra code? Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17Revert "capabitlies: ns_capable can use the cap helpers rather than lsm call"Linus Torvalds1-1/+1
This reverts commit d2a7009f0bb03fa22ad08dd25472efa0568126b9. J. R. Okajima explains: "After this commit, I am afraid access(2) on NFS may not work correctly. The scenario based upon my guess. - access(2) overrides the credentials. - calls inode_permission() -- ... -- generic_permission() -- ns_capable(). - while the old ns_capable() calls security_capable(current_cred()), the new ns_capable() calls has_ns_capability(current) -- security_capable(__task_cred(t)). current_cred() returns current->cred which is effective (overridden) credentials, but __task_cred(current) returns current->real_cred (the NFSD's credential). And the overridden credentials by access(2) lost." Requested-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-17Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgentIngo Molnar1-4/+4
2012-01-16Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds2-8/+8
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Hibernate: Drop the check of swap space size for compressed image PM / shmobile: fix A3SP suspend method PM / Domains: Skip governor functions for CONFIG_PM_RUNTIME unset PM / Domains: Fix build for CONFIG_PM_SLEEP unset PM: Make sysrq-o be available for CONFIG_PM unset
2012-01-16rcu: Add missing __cpuinit annotation in rcutorture codeHeiko Carstens1-2/+2
"rcu: Add rcutorture CPU-hotplug capability" adds cpu hotplug operations to the rcutorture code but produces a false positive warning about section mismatches: WARNING: vmlinux.o(.text+0x1e420c): Section mismatch in reference from the function rcu_torture_onoff() to the function .cpuinit.text:cpu_up() The function rcu_torture_onoff() references the function __cpuinit cpu_up(). This is often because rcu_torture_onoff lacks a __cpuinit annotation or the annotation of cpu_up is wrong. This commit therefore adds a __cpuinit annotation so the warning goes away. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-01-16rcu: Make rcutorture bool parameters really bool (core code)Rusty Russell1-2/+2
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. This commit makes this change to rcutorture. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-01-16Merge branch 'rcu/fixes-for-v3.2' into rcu/urgentPaul E. McKenney17-243/+933
Merge reason: Add these commits so that fixes on this branch do not conflict with already-mainlined code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-01-16tracepoints/module: Fix disabling tracepoints with taint CRAP or OOTSteven Rostedt1-3/+4
Tracepoints are disabled for tainted modules, which is usually because the module is either proprietary or was forced, and we don't want either of them using kernel tracepoints. But, a module can also be tainted by being in the staging directory or compiled out of tree. Either is fine for use with tracepoints, no need to punish them. I found this out when I noticed that my sample trace event module, when done out of tree, stopped working. Cc: stable@vger.kernel.org # 3.2 Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Dave Jones <davej@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-01-15error: implicit declaration of function 'module_flags_taint'Kevin Winchester1-20/+20
Recent changes to kernel/module.c caused the following compile error: kernel/module.c: In function ‘show_taint’: kernel/module.c:1024:2: error: implicit declaration of function ‘module_flags_taint’ [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Correct this error by moving the definition of module_flags_taint outside of the #ifdef CONFIG_MODULE_UNLOAD section. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-15Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+5
* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits) Revert "block: recursive merge requests" block: Stop using macro stubs for the bio data integrity calls blockdev: convert some macros to static inlines fs: remove unneeded plug in mpage_readpages() block: Add BLKROTATIONAL ioctl block: Introduce blk_set_stacking_limits function block: remove WARN_ON_ONCE() in exit_io_context() block: an exiting task should be allowed to create io_context block: ioc_cgroup_changed() needs to be exported block: recursive merge requests block, cfq: fix empty queue crash caused by request merge block, cfq: move icq creation and rq->elv.icq association to block core block, cfq: restructure io_cq creation path for io_context interface cleanup block, cfq: move io_cq exit/release to blk-ioc.c block, cfq: move icq cache management to block core block, cfq: move io_cq lookup to blk-ioc.c block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq block, cfq: reorganize cfq_io_context into generic and cfq specific parts block: remove elevator_queue->ops block: reorder elevator switch sequence ... Fix up conflicts in: - block/blk-cgroup.c Switch from can_attach_task to can_attach - block/cfq-iosched.c conflict with now removed cic index changes (we now use q->id instead)
2012-01-15Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-345/+683
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) perf tools: Fix compile error on x86_64 Ubuntu perf report: Fix --stdio output alignment when --showcpuutilization used perf annotate: Get rid of field_sep check perf annotate: Fix usage string perf kmem: Fix a memory leak perf kmem: Add missing closedir() calls perf top: Add error message for EMFILE perf test: Change type of '-v' option to INCR perf script: Add missing closedir() calls tracing: Fix compile error when static ftrace is enabled recordmcount: Fix handling of elf64 big-endian objects. perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again perf tools: Add support for guest/host-only profiling perf kvm: Do guest-only counting by default perf top: Don't update total_period on process_sample perf hists: Stop using 'self' for struct hist_entry perf hists: Rename total_session to total_period x86: Add counter when debug stack is used with interrupts enabled x86: Allow NMIs to hit breakpoints in i386 x86: Keep current stack in NMI breakpoints ...
2012-01-14Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds4-40/+60
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: capabilities: remove __cap_full_set definition security: remove the security_netlink_recv hook as it is equivalent to capable() ptrace: do not audit capability check when outputing /proc/pid/stat capabilities: remove task_ns_* functions capabitlies: ns_capable can use the cap helpers rather than lsm call capabilities: style only - move capable below ns_capable capabilites: introduce new has_ns_capabilities_noaudit capabilities: call has_ns_capability from has_capability capabilities: remove all _real_ interfaces capabilities: introduce security_capable_noaudit capabilities: reverse arguments to security_capable capabilities: remove the task from capable LSM hook entirely selinux: sparse fix: fix several warnings in the security server cod selinux: sparse fix: fix warnings in netlink code selinux: sparse fix: eliminate warnings for selinuxfs selinux: sparse fix: declare selinux_disable() in security.h selinux: sparse fix: move selinux_complete_init selinux: sparse fix: make selinux_secmark_refcount static SELinux: Fix RCU deref check warning in sel_netport_insert() Manually fix up a semantic mis-merge wrt security_netlink_recv(): - the interface was removed in commit fd7784615248 ("security: remove the security_netlink_recv hook as it is equivalent to capable()") - a new user of it appeared in commit a38f7907b926 ("crypto: Add userspace configuration API") causing no automatic merge conflict, but Eric Paris pointed out the issue.
2012-01-14Merge tag 'for-linus' of git://github.com/rustyrussell/linuxLinus Torvalds7-115/+146
Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 * tag 'for-linus' of git://github.com/rustyrussell/linux: module_param: check that bool parameters really are bool. intelfbdrv.c: bailearly is an int module_param paride/pcd: fix bool verbose module parameter. module_param: make bool parameters really bool (drivers & misc) module_param: make bool parameters really bool (arch) module_param: make bool parameters really bool (core code) kernel/async: remove redundant declaration. printk: fix unnecessary module_param_name. lirc_parallel: fix module parameter description. module_param: avoid bool abuse, add bint for special cases. module_param: check type correctness for module_param_array modpost: use linker section to generate table. modpost: use a table rather than a giant if/else statement. modules: sysfs - export: taint, coresize, initsize kernel/params: replace DEBUGP with pr_debug module: replace DEBUGP with pr_debug module: struct module_ref should contains long fields module: Fix performance regression on modules with large symbol tables module: Add comments describing how the "strmap" logic works Fix up conflicts in scripts/mod/file2alias.c due to the new linker- generated table approach to adding __mod_*_device_table entries. The ARM sa11x0 mcp bus needed to be converted to that too.
2012-01-14PM / Hibernate: Drop the check of swap space size for compressed imageBarry Song1-6/+7
For compressed image, the space required is not known until we finish compressing and writing all pages. This patch drops the check, and if swap space is not enough finally, system can still restore to normal after writing swap fails for compressed images. Signed-off-by: Barry Song <Baohua.Song@csr.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-14PM: Make sysrq-o be available for CONFIG_PM unsetRafael J. Wysocki1-2/+1
After commit 1eb208aea3179dd2fc0cdeea45ef869d75b4fe70, "PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)", the files under kernel/power are not built unless CONFIG_PM_SLEEP or CONFIG_PM_RUNTIME is set. In particular, this causes kernel/power/poweroff.c to be omitted, even though it should be compiled, because CONFIG_MAGIC_SYSRQ is set. Fix the problem by causing kernel/power/Makefile to be processed for CONFIG_PM unset too. Reported-and-tested-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-12c/r: prctl: add PR_SET_MM codes to set up mm_struct entriesCyrill Gorcunov1-0/+121
When we restore a task we need to set up text, data and data heap sizes from userspace to the values a task had at checkpoint time. This patch adds auxilary prctl codes for that. While most of them have a statistical nature (their values are involved into calculation of /proc/<pid>/statm output) the start_brk and brk values are used to compute an allowed size of program data segment expansion. Which means an arbitrary changes of this values might be dangerous operation. So to restrict access the following requirements applied to prctl calls: - The process has to have CAP_SYS_ADMIN capability granted. - For all opcodes except start_brk/brk members an appropriate VMA area must exist and should fit certain VMA flags, such as: - code segment must be executable but not writable; - data segment must not be executable. start_brk/brk values must not intersect with data segment and must not exceed RLIMIT_DATA resource limit. Still the main guard is CAP_SYS_ADMIN capability check. Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support otherwise these prctl calls will return -EINVAL. [akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12panic: don't print redundant backtraces on oopsAndi Kleen1-1/+5
When an oops causes a panic and panic prints another backtrace it's pretty common to have the original oops data be scrolled away on a 80x50 screen. The second backtrace is quite redundant and not needed anyways. So don't print the panic backtrace when oops_in_progress is true. [akpm@linux-foundation.org: add comment] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12sysctl: add the kernel.ns_last_pid controlPavel Emelyanov2-1/+34
The sysctl works on the current task's pid namespace, getting and setting its last_pid field. Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible to create a task with desired pid value. This ability is required badly for the checkpoint/restore in userspace. This approach suits all the parties for now. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kdump: fix crash_kexec()/smp_send_stop() race in panic()Michael Holzheu1-1/+17
When two CPUs call panic at the same time there is a possible race condition that can stop kdump. The first CPU calls crash_kexec() and the second CPU calls smp_send_stop() in panic() before crash_kexec() finished on the first CPU. So the second CPU stops the first CPU and therefore kdump fails: 1st CPU: panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump 2nd CPU: panic()->crash_kexec()->kexec_mutex already held by 1st CPU ->smp_send_stop()-> stop 1st CPU (stop kdump) This patch fixes the problem by introducing a spinlock in panic that allows only one CPU to process crash_kexec() and the subsequent panic code. All other CPUs call the weak function panic_smp_self_stop() that stops the CPU itself. This function can be overloaded by architecture code. For example "tile" can use their lower-power "nap" instruction for that. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kdump: crashk_res init check for /sys/kernel/kexec_crash_sizeMichael Holzheu1-5/+4
Currently it is possible to set the crash_size via the sysfs /sys/kernel/kexec_crash_size even if no crash kernel memory has been defined with the "crashkernel" parameter. In this case "crashk_res" is not initialized and crashk_res.start = crashk_res.end = 0. Unfortunately resource_size(&crashk_res) returns 1 in this case. This breaks the s390 implementation of crash_(un)map_reserved_pages(). To fix the problem the correct "old_size" is now calculated in crash_shrink_memory(). "old_size is set to "0" if crashk_res is not initialized. With this change crash_shrink_memory() will do nothing, when "crashk_res" is not initialized. It will return "0" for "echo 0 > /sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] > /sys/kernel/kexec_crash_size". In addition to that this patch also simplifies the "ret = -EINVAL" vs. "ret = 0" logic as suggested by Simon Horman. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Dave Young <dyoung@redhat.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kdump: add missing RAM resource in crash_shrink_memory()Michael Holzheu1-0/+15
When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for the newly added memory no RAM resource is created at the moment. Example: $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss c0000000-cfffffff : Crash kernel d0000000-ffffffff : System RAM $ echo 0 > /sys/kernel/kexec_crash_size $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss <<-- here is System RAM missing d0000000-ffffffff : System RAM One result of this bug is that the memory chunk can never be set offline using memory hotplug. With this patch I insert a new "System RAM" resource for the released memory. Then the upper example looks like the following: $ echo 0 > /sys/kernel/kexec_crash_size $ cat /proc/iomem 00000000-bfffffff : System RAM 00000000-005b7ac3 : Kernel code 005b7ac4-009743bf : Kernel data 009bb000-00a85c33 : Kernel bss c0000000-cfffffff : System RAM <<-- new rescoure d0000000-ffffffff : System RAM And now I can set chunk c0000000-cfffffff offline. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kexec: remove KMSG_DUMP_KEXECWANG Cong1-3/+0
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside /proc/vmcore, and it is unsafe to allow modules to do other stuffs in a crash dump scenario. [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12treewide: remove useless NORET_TYPE macro and usesJoe Perches2-4/+4
It's a very old and now unused prototype marking so just delete it. Neaten panic pointer argument style to keep checkpatch quiet. Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warningStephen Boyd1-1/+1
Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning: In file included from arch/x86/include/asm/uaccess.h:573, from kernel/kprobes.c:55: In function 'copy_from_user', inlined from 'write_enabled_file_bool' at kernel/kprobes.c:2191: arch/x86/include/asm/uaccess_64.h:65: warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct presumably due to buf_size being signed causing GCC to fail to see that buf_size can't become negative. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-13module_param: make bool parameters really bool (core code)Rusty Russell3-6/+6
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13kernel/async: remove redundant declaration.Rusty Russell1-2/+0
It's in linux/init.h, and I'm about to change it to a bool. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13printk: fix unnecessary module_param_name.Rusty Russell1-1/+1
You don't need module_param_name if the name is the same! Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module_param: avoid bool abuse, add bint for special cases.Rusty Russell1-0/+24
For historical reasons, we allow module_param(bool) to take an int (or an unsigned int). That's going away. A few drivers really want an int: they set it to -1 and a parameter will set it to 0 or 1. This sucks: reading them from sysfs will give 'Y' for both -1 and 1, but if we change it to an int, then the users might be broken (if they did "param" instead of "param=1"). Use a new 'bint' parser for them. (ntfs has a different problem: it needs an int for debug_msgs because it's also exposed via sysctl.) Cc: Steve Glendinning <steve.glendinning@smsc.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux390@de.ibm.com Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: lm-sensors@lm-sensors.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: alsa-devel@alsa-project.org Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part) Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13modules: sysfs - export: taint, coresize, initsizeKay Sievers1-29/+64
Recent tools do not want to use /proc to retrieve module information. A few values are currently missing from sysfs to replace the information available in /proc/modules. This adds /sys/module/*/{coresize,initsize,taint} attributes. TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always shown now, and do no longer exclude each other, also in /proc/modules. Replace the open-coded sysfs attribute initializers with the __ATTR() macro. Add the new attributes to Documentation/ABI. Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13kernel/params: replace DEBUGP with pr_debugJim Cromie1-10/+4
Use more flexible pr_debug. This allows: echo "module params +p" > /dbg/dynamic_debug/control to turn on debug messages when needed. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: replace DEBUGP with pr_debugJim Cromie1-26/+20
Use more flexible pr_debug. This allows: echo "module module +p" > /dbg/dynamic_debug/control to turn on debug messages when needed. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: struct module_ref should contains long fieldsEric Dumazet2-5/+5
module_ref contains two "unsigned int" fields. Thats now too small, since some machines can open more than 2^32 files. Check commit 518de9b39e8 (fs: allow for more than 2^31 files) for reference. We can add an aligned(2 * sizeof(unsigned long)) attribute to force alloc_percpu() allocating module_ref areas in single cache lines. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Tejun Heo <tj@kernel.org> CC: Robin Holt <holt@sgi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: Fix performance regression on modules with large symbol tablesKevin Cernekee1-44/+21
Looking at /proc/kallsyms, one starts to ponder whether all of the extra strtab-related complexity in module.c is worth the memory savings. Instead of making the add_kallsyms() loop even more complex, I tried the other route of deleting the strmap logic and naively copying each string into core_strtab with no consideration for consolidating duplicates. Performance on an "already exists" insmod of nvidia.ko (runs add_kallsyms() but does not actually initialize the module): Original scheme: 1.230s With naive copying: 0.058s Extra space used: 35k (of a 408k module). Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>
2012-01-13module: Add comments describing how the "strmap" logic worksKevin Cernekee1-0/+9
Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-11Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-10/+7
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix lockup by limiting load-balance retries on lock-break sched: Fix CONFIG_CGROUP_SCHED dependency sched: Remove empty #ifdefs
2012-01-11Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+9
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, reboot: Fix typo in nmi reboot path x86, NMI: Add to_cpumask() to silence compile warning x86, NMI: NMI selftest depends on the local apic x86: Add stack top margin for stack overflow checking x86, NMI: NMI-selftest should handle the UP case properly x86: Fix the 32-bit stackoverflow-debug build x86, NMI: Add knob to disable using NMI IPIs to stop cpus x86, NMI: Add NMI IPI selftest x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus x86: Clean up the range of stack overflow checking x86: Panic on detection of stack overflow x86: Check stack overflow in detail
2012-01-11sched: Fix lockup by limiting load-balance retries on lock-breakPeter Zijlstra1-3/+7
Eric and David reported dead machines and traced it to commit a195f004 ("sched: Fix load-balance lock-breaking"), it turns out there's still a scenario where we can end up re-trying forever. Since there is no strict forward progress guarantee in the load-balance iteration we can get stuck re-retrying the same task-set over and over. Creating a forward progress guarantee with the existing structure is somewhat non-trivial, for now simply terminate the retry loop after a few tries. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: David Ahern <dsahern@gmail.com> [ logic cleanup as suggested by Eric ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-10Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds2-0/+4
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c writeback: balanced_rate cannot exceed write bandwidth writeback: do strict bdi dirty_exceeded writeback: avoid tiny dirty poll intervals writeback: max, min and target dirty pause time writeback: dirty ratelimit - think time compensation btrfs: fix dirtied pages accounting on sub-page writes writeback: fix dirtied pages accounting on redirty writeback: fix dirtied pages accounting on sub-page writes writeback: charge leaked page dirties to active tasks writeback: Include all dirty inodes in background writeback