linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-03-23	job control: Add @for_ptrace to do_notify_parent_cldstop()	Tejun Heo	1	-7/+24
	Currently, do_notify_parent_cldstop() determines whether the notification is for the real parent or ptracer. Move the decision to the caller by adding @for_ptrace parameter to do_notify_parent_cldstop(). All the callers are updated to pass task_ptrace(target_task), so this patch doesn't cause any behavior difference. While at it, add function comment to do_notify_parent_cldstop(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2011-03-23	job control: Don't set group_stop exit_code if re-entering job control stop	Tejun Heo	1	-3/+20
	While ptraced, a task may be resumed while the containing process is still job control stopped. If the task receives another stop signal in this state, it will still initiate group stop, which generates group_exit_code, which the real parent would be able to see once the ptracer detaches. In this scenario, the real parent may see two consecutive CLD_STOPPED events from two stop signals without intervening SIGCONT, which normally is impossible. Test case follows. #include <stdio.h> #include <unistd.h> #include <sys/ptrace.h> #include <sys/wait.h> int main(void) { pid_t tracee; siginfo_t si; tracee = fork(); if (!tracee) while (1) pause(); kill(tracee, SIGSTOP); waitid(P_PID, tracee, &si, WSTOPPED); if (!fork()) { ptrace(PTRACE_ATTACH, tracee, NULL, NULL); waitid(P_PID, tracee, &si, WSTOPPED); ptrace(PTRACE_CONT, tracee, NULL, (void )(long)si.si_status); waitid(P_PID, tracee, &si, WSTOPPED); ptrace(PTRACE_CONT, tracee, NULL, (void )(long)si.si_status); waitid(P_PID, tracee, &si, WSTOPPED); ptrace(PTRACE_DETACH, tracee, NULL, NULL); return 0; } while (1) { si.si_pid = 0; waitid(P_PID, tracee, &si, WSTOPPED \| WNOHANG); if (si.si_pid) printf("st=%02d c=%02d\n", si.si_status, si.si_code); } return 0; } Before the patch, the latter waitid() in polling mode reports the second stopped event generated by the implied SIGSTOP of PTRACE_ATTACH. st=19 c=05 ^C After the patch, the second event is not reported. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2011-03-23	ptrace: Clean transitions between TASK_STOPPED and TRACED	Tejun Heo	1	-13/+66
	Currently, if the task is STOPPED on ptrace attach, it's left alone and the state is silently changed to TRACED on the next ptrace call. The behavior breaks the assumption that arch_ptrace_stop() is called before any task is poked by ptrace and is ugly in that a task manipulates the state of another task directly. With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and TRACED can be made clean. The tracer can use the flag to tell the tracee to retry stop on attach and detach. On retry, the tracee will enter the desired state in the correct way. The lower 16bits of task->group_stop is used to remember the signal number which caused the last group stop. This is used while retrying for ptrace attach as the original group_exit_code could have been consumed with wait(2) by then. As the real parent may wait(2) and consume the group_exit_code anytime, the group_exit_code needs to be saved separately so that it can be used when switching from regular sleep to ptrace_stop(). This is recorded in the lower 16bits of task->group_stop. If a task is already stopped and there's no intervening SIGCONT, a ptrace request immediately following a successful PTRACE_ATTACH should always succeed even if the tracer doesn't wait(2) for attach completion; however, with this change, the tracee might still be TASK_RUNNING trying to enter TASK_TRACED which would cause the following request to fail with -ESRCH. This intermediate state is hidden from the ptracer by setting GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait for it to clear on its signal->wait_chldexit. Completing the transition or getting killed clears TRAPPING and wakes up the tracer. Note that the STOPPED -> RUNNING -> TRACED transition is still visible to other threads which are in the same group as the ptracer and the reverse transition is visible to all. Please read the comments for details. Oleg: * Spotted a race condition where a task may retry group stop without proper bookkeeping. Fixed by redoing bookkeeping on retry. * Spotted that the transition is visible to userland in several different ways. Most are fixed with GROUP_STOP_TRAPPING. Unhandled corner case is documented. * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped task would result in more consistent behavior. * Pointed out that calling ptrace_stop() from do_signal_stop() in TASK_STOPPED can race with group stop start logic and then confuse the TRAPPING wait in ptrace_check_attach(). ptrace_stop() is now called with TASK_RUNNING. * Suggested using signal->wait_chldexit instead of bit wait. * Spotted a race condition between TRACED transition and clearing of TRAPPING. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
2011-03-23	ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced	Tejun Heo	1	-18/+25
	A ptraced task would still stop at do_signal_stop() when it's stopping for stop signals and do_signal_stop() behaves the same whether the task is ptraced or not. However, in addition to stopping, ptrace_stop() also does ptrace specific stuff like calling architecture specific callbacks, so this behavior makes the code more fragile and difficult to understand. This patch makes do_signal_stop() test whether the task is ptraced and use ptrace_stop() if so. This renders tracehook_notify_jctl() rather pointless as the ptrace notification is now handled by ptrace_stop() regardless of the return value from the tracehook. It probably is a good idea to update it. This doesn't solve the whole problem as tasks already in stopped state would stay in the regular stop when ptrace attached. That part will be handled by the next patch. Oleg pointed out that this makes a userland-visible change. Before, SIGCONT would be able to wake up a task in group stop even if the task is ptraced if the tracer hasn't issued another ptrace command afterwards (as the next ptrace commands transitions the state into TASK_TRACED which ignores SIGCONT wakeups). With this and the next patch, SIGCONT may race with the transition into TASK_TRACED and is ignored if the tracee already entered TASK_TRACED. Another userland visible change of this and the next patch is that the ptracee's state would now be TASK_TRACED where it used to be TASK_STOPPED, which is visible via fs/proc. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
2011-03-23	ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop	Tejun Heo	1	-3/+6
	Currently, ptrace_stop() unconditionally participates in group stop bookkeeping. This is unnecessary and inaccurate. Make it only participate if the task is trapping for group stop - ie. if @why is CLD_STOPPED. As ptrace_stop() currently is not used when trapping for group stop, this equals to disabling group stop participation from ptrace_stop(). A visible behavior change is increased likelihood of delayed group stop completion if the thread group contains one or more ptraced tasks. This is to preapre for further cleanup of the interaction between group stop and ptrace. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com>
2011-03-23	signal: Use GROUP_STOP_PENDING to stop once for a single group stop	Tejun Heo	1	-15/+21
	Currently task->signal->group_stop_count is used to decide whether to stop for group stop. However, if there is a task in the group which is taking a long time to stop, other tasks which are continued by ptrace would repeatedly stop for the same group stop until the group stop is complete. Conversely, if a ptraced task is in TASK_TRACED state, the debugger won't get notified of group stops which is inconsistent compared to the ptraced task in any other state. This patch introduces GROUP_STOP_PENDING which tracks whether a task is yet to stop for the group stop in progress. The flag is set when a group stop starts and cleared when the task stops the first time for the group stop, and consulted whenever whether the task should participate in a group stop needs to be determined. Note that now tasks in TASK_TRACED also participate in group stop. This results in the following behavior changes. * For a single group stop, a ptracer would see at most one stop reported. * A ptracee in TASK_TRACED now also participates in group stop and the tracer would get the notification. However, as a ptraced task could be in TASK_STOPPED state or any ptrace trap could consume group stop, the notification may still be missing. These will be addressed with further patches. * A ptracee may start a group stop while one is still in progress if the tracer let it continue with stop signal delivery. Group stop code handles this correctly. Oleg: * Spotted that a task might skip signal check even when its GROUP_STOP_PENDING is set. Fixed by updating recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of group_stop_count. * Pointed out that task->group_stop should be cleared whenever task->signal->group_stop_count is cleared. Fixed accordingly. * Pointed out the behavior inconsistency between TASK_TRACED and RUNNING and the last behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com>
2011-03-23	signal: Fix premature completion of group stop when interfered by ptrace	Tejun Heo	1	-8/+54
	task->signal->group_stop_count is used to track the progress of group stop. It's initialized to the number of tasks which need to stop for group stop to finish and each stopping or trapping task decrements. However, each task doesn't keep track of whether it decremented the counter or not and if woken up before the group stop is complete and stops again, it can decrement the counter multiple times. Please consider the following example code. static void worker(void arg) { while (1) ; return NULL; } int main(void) { pthread_t thread; pid_t pid; int i; pid = fork(); if (!pid) { for (i = 0; i < 5; i++) pthread_create(&thread, NULL, worker, NULL); while (1) ; return 0; } ptrace(PTRACE_ATTACH, pid, NULL, NULL); while (1) { waitid(P_PID, pid, NULL, WSTOPPED); ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP); } return 0; } The child creates five threads and the parent continuously traps the first thread and whenever the child gets a signal, SIGSTOP is delivered. If an external process sends SIGSTOP to the child, all other threads in the process should reliably stop. However, due to the above bug, the first thread will often end up consuming group_stop_count multiple times and SIGSTOP often ends up stopping none or part of the other four threads. This patch adds a new field task->group_stop which is protected by siglock and uses GROUP_STOP_CONSUME flag to track which task is still to consume group_stop_count to fix this bug. task_clear_group_stop_pending() and task_participate_group_stop() are added to help manipulating group stop states. As ptrace_stop() now also uses task_participate_group_stop(), it will set SIGNAL_STOP_STOPPED if it completes a group stop. There still are many issues regarding the interaction between group stop and ptrace. Patches to address them will follow. - Oleg spotted duplicate GROUP_STOP_CONSUME. Dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com>
2011-03-23	ptrace: Add @why to ptrace_stop()	Tejun Heo	1	-4/+4
	To prepare for cleanup of the interaction between group stop and ptrace, add @why to ptrace_stop(). Existing users are updated such that there is no behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Roland McGrath <roland@redhat.com>
2011-03-23	ptrace: Kill tracehook_notify_jctl()	Tejun Heo	1	-20/+14
	tracehook_notify_jctl() aids in determining whether and what to report to the parent when a task is stopped or continued. The function also adds an extra requirement that siglock may be released across it, which is currently unused and quite difficult to satisfy in well-defined manner. As job control and the notifications are about to receive major overhaul, remove the tracehook and open code it. If ever necessary, let's factor it out after the overhaul. * Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com>
2011-03-23	signal: Remove superflous try_to_freeze() loop in do_signal_stop()	Tejun Heo	1	-3/+1
	do_signal_stop() is used only by get_signal_to_deliver() and after a successful signal stop, it always calls try_to_freeze(), so the try_to_freeze() loop around schedule() in do_signal_stop() is superflous and confusing. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com>
2011-03-23	signal: Fix SIGCONT notification code	Tejun Heo	1	-2/+7
	After a task receives SIGCONT, its parent is notified via SIGCHLD with its siginfo describing what the notified event is. If SIGCONT is received while the child process is stopped, the code should be CLD_CONTINUED. If SIGCONT is recieved while the child process is in the process of being stopped, it should be CLD_STOPPED. Which code to use is determined in prepare_signal() and recorded in signal->flags using SIGNAL_CLD_CONTINUED\|STOP flags. get_signal_deliver() should test these flags and then notify accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying CLD_CONTINUED if the signal is delivered before the task is wait(2)ed and CLD_STOPPED if the state was fetched already. Fix it by testing SIGNAL_CLD_CONTINUED. While at it, uncompress the ?: test into if/else clause for better readability. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com>
2011-03-21	Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code	Julien Tinnes	1	-4/+12
	Userland should be able to trust the pid and uid of the sender of a signal if the si_code is SI_TKILL. Unfortunately, the kernel has historically allowed sigqueueinfo() to send any si_code at all (as long as it was negative - to distinguish it from kernel-generated signals like SIGILL etc), so it could spoof a SI_TKILL with incorrect siginfo values. Happily, it looks like glibc has always set si_code to the appropriate SI_QUEUE, so there are probably no actual user code that ever uses anything but the appropriate SI_QUEUE flag. So just tighten the check for si_code (we used to allow any negative value), and add a (one-time) warning in case there are binaries out there that might depend on using other si_code values. Signed-off-by: Julien Tinnes <jln@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27	signals: annotate lock context change on ptrace_stop()	Namhyung Kim	1	-0/+2
	ptrace_stop() releases and regrabs current->sighand->siglock but was missing proper annotation. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27	signals: annotate lock_task_sighand()	Namhyung Kim	1	-1/+2
	lock_task_sighand() grabs sighand->siglock in case of returning non-NULL but unlock_task_sighand() releases it unconditionally. This leads sparse to complain about the lock context imbalance. Rename and wrap lock_task_sighand() using __cond_lock() macro to make sparse happy. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07	HWPOISON: Copy si_addr_lsb to user	Andi Kleen	1	-0/+8
	The original hwpoison code added a new siginfo field si_addr_lsb to pass the granuality of the fault address to user space. Unfortunately this field was never copied to user space. Fix this here. I added explicit checks for the MCEERR codes to avoid having to patch all potential callers to initialize the field. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-04	CRED: Fix RCU warning due to previous patch fixing __task_cred()'s checks	David Howells	1	-3/+6
	Commit 8f92054e7ca1 ("CRED: Fix __task_cred()'s lockdep check and banner comment") fixed the lockdep checks on __task_cred(). This has shown up a place in the signalling code where a lock should be held - namely that check_kill_permission() requires its callers to hold the RCU lock. Fix group_send_sig_info() to get the RCU read lock around its call to check_kill_permission(). Without this patch, the following warning can occur: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/signal.c:660 invoked rcu_dereference_check() without protection! ... Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27	exit: change zap_other_threads() to count sub-threads	Oleg Nesterov	1	-8/+9
	Change zap_other_threads() to return the number of other sub-threads found on ->thread_group list. Other changes are cosmetic: - change the code to use while_each_thread() helper - remove the obsolete comment about SIGKILL/SIGSTOP Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27	signals: check_kill_permission(): don't check creds if same_thread_group()	Oleg Nesterov	1	-2/+4
	Andrew Tridgell reports that aio_read(SIGEV_SIGNAL) can fail if the notification from the helper thread races with setresuid(), see http://samba.org/~tridge/junkcode/aio_uid.c This happens because check_kill_permission() doesn't permit sending a signal to the task with the different cred->xids. But there is not any security reason to check ->cred's when the task sends a signal (private or group-wide) to its sub-thread. Whatever we do, any thread can bypass all security checks and send SIGKILL to all threads, or it can block a signal SIG and do kill(gettid(), SIG) to deliver this signal to another sub-thread. Not to mention that CLONE_THREAD implies CLONE_VM. Change check_kill_permission() to avoid the credentials check when the sender and the target are from the same thread group. Also, move "cred = current_cred()" down to avoid calling get_current() twice. Note: David Howells pointed out we could relax this even more, the CLONE_SIGHAND (without CLONE_THREAD) case probably does not need these checks too. Roland said: : The glibc (libpthread) that does set*id across threads has : been in use for a while (2.3.4?), probably in distro's using kernels as old : or older than any active -stable streams. In the race in question, this : kernel bug is breaking valid POSIX application expectations. Reported-by: Andrew Tridgell <tridge@samba.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Roland McGrath <roland@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: <stable@kernel.org> [all kernel versions] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-20	kdb: core for kgdb back end (2 of 2)	Jason Wessel	1	-0/+40
	This patch contains the hooks and instrumentation into kernel which live outside the kernel/debug directory, which the kdb core will call to run commands like lsmod, dmesg, bt etc... CC: linux-arch@vger.kernel.org Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Martin Hicks <mort@sgi.com>
2010-03-06	kernel core: use helpers for rlimits	Jiri Slaby	1	-1/+1
	Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource: add helpers for fetching rlimits") or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-03	Prioritize synchronous signals over 'normal' signals	Linus Torvalds	1	-13/+30
	This makes sure that we pick the synchronous signals caused by a processor fault over any pending regular asynchronous signals sent to use by [t]kill(). This is not strictly required semantics, but it makes it _much_ easier for programs like Wine that expect to find the fault information in the signal stack. Without this, if a non-synchronous signal gets picked first, the delayed asynchronous signal will have its signal context pointing to the new signal invocation, rather than the instruction that caused the SIGSEGV or SIGBUS in the first place. This is not all that pretty, and we're discussing making the synchronous signals more explicit rather than have these kinds of implicit preferences of SIGSEGV and friends. See for example http://bugzilla.kernel.org/show_bug.cgi?id=15395 for some of the discussion. But in the meantime this is a simple and fairly straightforward work-around, and the whole if (x & Y) x &= Y; thing can be compiled into (and gcc does do it) just three instructions: movq %rdx, %rax andl $Y, %eax cmovne %rax, %rdx so it is at least a simple solution to a subtle issue. Reported-and-tested-by: Pavel Vilim <wylda@volny.cz> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11	kernel/signal.c: fix kernel information leak with print-fatal-signals=1	Andi Kleen	1	-1/+2
	When print-fatal-signals is enabled it's possible to dump any memory reachable by the kernel to the log by simply jumping to that address from user space. Or crash the system if there's some hardware with read side effects. The fatal signals handler will dump 16 bytes at the execution address, which is fully controlled by ring 3. In addition when something jumps to a unmapped address there will be up to 16 additional useless page faults, which might be potentially slow (and at least is not very efficient) Fortunately this option is off by default and only there on i386. But fix it by checking for kernel addresses and also stopping when there's a page fault. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-19	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-11/+14
	* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sys: Fix missing rcu protection for __task_cred() access signals: Fix more rcu assumptions signal: Fix racy access to __task_cred in kill_pid_info_as_uid()
2009-12-16	signals: check ->group_stop_count after tracehook_get_signal()	Oleg Nesterov	1	-5/+4
	Move the call to do_signal_stop() down, after tracehook call. This makes ->group_stop_count condition visible to tracers before do_signal_stop() will participate in this group-stop. Currently the patch has no effect, tracehook_get_signal() always returns 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16	signals: kill force_sig_specific()	Oleg Nesterov	1	-6/+0
	Kill force_sig_specific(), this trivial wrapper has no callers. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16	signals: cosmetic, collect_signal: use SI_USER	Oleg Nesterov	1	-1/+1
	Trivial, s/0/SI_USER/ in collect_signal() for grep. This is a bit confusing, we don't know the source of this signal. But we don't care, and "info->si_code = 0" is imho worse. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16	signals: send_signal: use si_fromuser() to detect from_ancestor_ns	Oleg Nesterov	1	-3/+2
	Change send_signal() to use si_fromuser(). From now SEND_SIG_NOINFO triggers the "from_ancestor_ns" check. This fixes reparent_thread()->group_send_sig_info(pdeath_signal) behaviour, before this patch send_signal() does not detect the cross-namespace case when the child of the dying parent belongs to the sub-namespace. This patch can affect the behaviour of send_sig(), kill_pgrp() and kill_pid() when the caller sends the signal to the sub-namespace with "priv == 0" but surprisingly all callers seem to use them correctly, including disassociate_ctty(on_exit). Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly use send_sig(priv => 0). But his is minor and should be fixed anyway. Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16	signals: SEND_SIG_NOINFO should be considered as SI_FROMUSER()	Oleg Nesterov	1	-3/+13
	No changes in compiled code. The patch adds the new helper, si_fromuser() and changes check_kill_permission() to use this helper. The real effect of this patch is that from now we "officially" consider SEND_SIG_NOINFO signal as "from user-space" signals. This is already true if we look at the code which uses SEND_SIG_NOINFO, except __send_signal() has another opinion - see the next patch. The naming of these special SEND_SIG_XXX siginfo's is really bad imho. From __send_signal()'s pov they mean SEND_SIG_NOINFO from user SEND_SIG_PRIV from kernel SEND_SIG_FORCED no info Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-10	signals: Fix more rcu assumptions	Thomas Gleixner	1	-4/+4
	1) Remove the misleading comment in __sigqueue_alloc() which claims that holding a spinlock is equivalent to rcu_read_lock(). 2) Add a rcu_read_lock/unlock around the __task_cred() access in __sigqueue_alloc() This needs to be revisited to remove the remaining users of read_lock(&tasklist_lock) but that's outside the scope of this patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20091210004703.269843657@linutronix.de>
2009-12-10	signal: Fix racy access to __task_cred in kill_pid_info_as_uid()	Thomas Gleixner	1	-7/+10
	kill_pid_info_as_uid() accesses __task_cred() without being in a RCU read side critical section. tasklist_lock is not protecting that when CONFIG_TREE_PREEMPT_RCU=y. Convert the whole tasklist_lock section to rcu and use lock_task_sighand to prevent the exit race. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20091210004703.232302055@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com>
2009-12-05	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-7/+20
	* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits) x86: Fix comments of register/stack access functions perf tools: Replace %m with %a in sscanf hw-breakpoints: Keep track of user disabled breakpoints tracing/syscalls: Make syscall events print callbacks static tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook perf: Don't free perf_mmap_data until work has been done perf_event: Fix compile error perf tools: Fix _GNU_SOURCE macro related strndup() build error trace_syscalls: Remove unused syscall_name_to_nr() trace_syscalls: Simplify syscall profile trace_syscalls: Remove duplicate init_enter_##sname() trace_syscalls: Add syscall_nr field to struct syscall_metadata trace_syscalls: Remove enter_id exit_id trace_syscalls: Set event_enter_##sname->data to its metadata trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit perf_event: Initialize data.period in perf_swevent_hrtimer() perf probe: Simplify event naming perf probe: Add --list option for listing current probe events perf probe: Add argv_split() from lib/argv_split.c perf probe: Move probe event utility functions to probe-event.c ...
2009-11-26	tracepoint: Add signal loss events	Masami Hiramatsu	1	-5/+14
	Add signal_overflow_fail and signal_lose_info tracepoints for signal-lost events. Changes in v3: - Add docbook style comments Changes in v2: - Use siginfo string macro Suggested-by: Roland McGrath <roland@redhat.com> Reviewed-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <20091124215658.30449.9934.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26	tracepoint: Add signal deliver event	Masami Hiramatsu	1	-0/+3
	Add a tracepoint where a process gets a signal. This tracepoint shows signal-number, sa-handler and sa-flag. Changes in v3: - Add docbook style comments Changes in v2: - Add siginfo argument - Fix comment Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Reviewed-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <20091124215651.30449.20926.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26	tracepoint: Move signal sending tracepoint to events/signal.h	Masami Hiramatsu	1	-2/+3
	Move signal sending event to events/signal.h. This patch also renames sched_signal_send event to signal_generate. Changes in v4: - Fix a typo of task_struct pointer. Changes in v3: - Add docbook style comments Changes in v2: - Add siginfo argument - Add siginfo storing macro Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Reviewed-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <20091124215645.30449.60208.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-09	signal: Print warning message when dropping signals	Naohiro Ooiwa	1	-13/+33
	When the system has too many timers or too many aggregate queued signals, the EAGAIN error is returned to application from kernel, including timer_create() [POSIX.1b]. It means that the app exceeded the limit of pending signals, but in general application writers do not expect this outcome and the current silent failure can cause rare app failures under very high load. This patch adds a new message when we reach the limit and if print_fatal_signals is enabled: task/1234: reached RLIMIT_SIGPENDING, dropping signal If you see this message and your system behaved unexpectedly, you can run following command to lift the limit: # ulimit -i unlimited With help from Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>. Signed-off-by: Naohiro Ooiwa <nooiwa@miraclelinux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Roland McGrath <roland@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: oleg@redhat.com LKML-Reference: <4AF6E7E2.9080406@miraclelinux.com> [ Modified a few small details, gave surrounding code some love. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24	signals: inline __fatal_signal_pending	Roland McGrath	1	-6/+0
	__fatal_signal_pending inlines to one instruction on x86, probably two instructions on other machines. It takes two longer x86 instructions just to call it and test its return value, not to mention the function itself. On my random x86_64 config, this saved 70 bytes of text (59 of those being __fatal_signal_pending itself). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24	signals: introduce do_send_sig_info() helper	Oleg Nesterov	1	-29/+27
	Introduce do_send_sig_info() and convert group_send_sig_info(), send_sig_info(), do_send_specific() to use this helper. Hopefully it will have more users soon, it allows to specify specific/group behaviour via "bool group" argument. Shaves 80 bytes from .text. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24	signals: tracehook_notify_jctl change	Roland McGrath	1	-50/+47
	This changes tracehook_notify_jctl() so it's called with the siglock held, and changes its argument and return value definition. These clean-ups make it a better fit for what new tracing hooks need to check. Tracing needs the siglock here, held from the time TASK_STOPPED was set, to avoid potential SIGCONT races if it wants to allow any blocking in its tracing hooks. This also folds the finish_stop() function into its caller do_signal_stop(). The function is short, called only once and only unconditionally. It aids readability to fold it in. [oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state] [oleg@redhat.com: introduce tracehook_finish_jctl() helper] Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24	ptrace: __ptrace_detach: do __wake_up_parent() if we reap the tracee	Oleg Nesterov	1	-9/+0
	The bug is old, it wasn't cause by recent changes. Test case: static void tfunc(void arg) { int pid = (long)arg; assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0); kill(pid, SIGKILL); sleep(1); return NULL; } int main(void) { pthread_t th; long pid = fork(); if (!pid) pause(); signal(SIGCHLD, SIG_IGN); assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0); int r = waitpid(-1, NULL, __WNOTHREAD); printf("waitpid: %d %m\n", r); return 0; } Before the patch this program hangs, after this patch waitpid() correctly fails with errno == -ECHILD. The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its ->real_parent is our sub-thread and we ignore SIGCHLD. But in this case we should wake up other threads which can sleep in do_wait(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-01	do_sigaltstack: small cleanups	Linus Torvalds	1	-4/+6
	The previous commit ("do_sigaltstack: avoid copying 'stack_t' as a structure to user space") fixed a real bug. This one just cleans up the copy from user space to that gcc can generate better code for it (and so that it looks the same as the later copy back to user space). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-01	do_sigaltstack: avoid copying 'stack_t' as a structure to user space	Linus Torvalds	1	-7/+8
	Ulrich Drepper correctly points out that there is generally padding in the structure on 64-bit hosts, and that copying the structure from kernel to user space can leak information from the kernel stack in those padding bytes. Avoid the whole issue by just copying the three members one by one instead, which also means that the function also can avoid the need for a stack frame. This also happens to match how we copy the new structure from user space, so it all even makes sense. [ The obvious solution of adding a memset() generates horrid code, gcc does really stupid things. ] Reported-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18	ptrace: do_notify_parent_cldstop: fix the wrong ->nsproxy usage	Oleg Nesterov	1	-1/+1
	If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the notification to group_leader->real_parent and we report group_leader's pid. But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns, the tracer and parent can live in different namespaces. Change the code to use "parent" instead of tsk->parent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18	ptrace: do not use task->ptrace directly in core kernel	Oleg Nesterov	1	-5/+5
	No functional changes. - Nobody except ptrace.c & co should use ptrace flags directly, we have task_ptrace() for that. - No need to specially check PT_PTRACED, we must not have other PT_ bits set without PT_PTRACED. And no need to know this flag exists. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-15	signal: fix __send_signal() false positive kmemcheck warning	Vegard Nossum	1	-3/+8
	This false positive is due to field padding in struct sigqueue. When this dynamically allocated structure is copied to the stack (in arch- specific delivery code), kmemcheck sees a read from the padding, which is, naturally, uninitialized. Hide the false positive using the __GFP_NOTRACK_FALSE_POSITIVE flag. Also made the rlimit override code a bit clearer by introducing a new variable. Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-11	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6	Linus Torvalds	1	-3/+8
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (44 commits) nommu: Provide mmap_min_addr definition. TOMOYO: Add description of lists and structures. TOMOYO: Remove unused field. integrity: ima audit dentry_open failure TOMOYO: Remove unused parameter. security: use mmap_min_addr indepedently of security models TOMOYO: Simplify policy reader. TOMOYO: Remove redundant markers. SELinux: define audit permissions for audit tree netlink messages TOMOYO: Remove unused mutex. tomoyo: avoid get+put of task_struct smack: Remove redundant initialization. integrity: nfsd imbalance bug fix rootplug: Remove redundant initialization. smack: do not beyond ARRAY_SIZE of data integrity: move ima_counts_get integrity: path_check update IMA: Add __init notation to ima functions IMA: Minimal IMA policy and boot param for TCB IMA policy selinux: remove obsolete read buffer limit from sel_read_bool ...
2009-06-10	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-3/+1
	* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-05-08	Merge branch 'master' into next	James Morris	1	-14/+49

2009-04-30	signals: implement sys_rt_tgsigqueueinfo	Thomas Gleixner	1	-0/+26
	sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is missing a thread directed counterpart. Such an interface is important for migrating applications from other OSes which have the per thread delivery implemented. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Ulrich Drepper <drepper@redhat.com>
2009-04-30	signals: split do_tkill	Thomas Gleixner	1	-12/+18
	Split out the code from do_tkill to make it reusable by the follow up patch which implements sys_rt_tgsigqueueinfo Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
2009-04-30	SELinux: Don't flush inherited SIGKILL during execve()	David Howells	1	-3/+8
	Don't flush inherited SIGKILL during execve() in SELinux's post cred commit hook. This isn't really a security problem: if the SIGKILL came before the credentials were changed, then we were right to receive it at the time, and should honour it; if it came after the creds were changed, then we definitely should honour it; and in any case, all that will happen is that the process will be scrapped before it ever returns to userspace. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>