aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/vsyscall_64.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-09-03seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computingAndy Lutomirski1-1/+1
The secure_computing function took a syscall number parameter, but it only paid any attention to that parameter if seccomp mode 1 was enabled. Rather than coming up with a kludge to get the parameter to work in mode 2, just remove the parameter. To avoid churn in arches that don't have seccomp filters (and may not even support syscall_get_nr right now), this leaves the parameter in secure_computing_strict, which is now a real function. For ARM, this is a bit ugly due to the fact that ARM conditionally supports seccomp filters. Fixing that would probably only be a couple of lines of code, but it should be coordinated with the audit maintainers. This will be a slight slowdown on some arches. The right fix is to pass in all of seccomp_data instead of trying to make just the syscall nr part be fast. This is a prerequisite for making two-phase seccomp work cleanly. Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2014-07-25x86_64/vsyscall: Fix warn_bad_vsyscall log outputAndy Lutomirski1-4/+4
This commit in Linux 3.6: commit c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655 Author: Joe Perches <joe@perches.com> Date: Mon May 21 19:50:07 2012 -0700 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> caused warn_bad_vsyscall to output garbage in the middle of the line. Revert the bad part of it. The printk in question isn't actually bare; the level is "%s". The bug this fixes is purely cosmetic; backports are optional. Cc: <stable@vger.kernel.org> # v3.6+ Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSOAndy Lutomirski1-11/+4
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-07Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-1/+5
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki: "The purpose of this single series of commits from Srivatsa S Bhat (with a small piece from Gautham R Shenoy) touching multiple subsystems that use CPU hotplug notifiers is to provide a way to register them that will not lead to deadlocks with CPU online/offline operations as described in the changelog of commit 93ae4f978ca7f ("CPU hotplug: Provide lockless versions of callback registration functions"). The first three commits in the series introduce the API and document it and the rest simply goes through the users of CPU hotplug notifiers and converts them to using the new method" * tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits) net/iucv/iucv.c: Fix CPU hotplug callback registration net/core/flow.c: Fix CPU hotplug callback registration mm, zswap: Fix CPU hotplug callback registration mm, vmstat: Fix CPU hotplug callback registration profile: Fix CPU hotplug callback registration trace, ring-buffer: Fix CPU hotplug callback registration xen, balloon: Fix CPU hotplug callback registration hwmon, via-cputemp: Fix CPU hotplug callback registration hwmon, coretemp: Fix CPU hotplug callback registration thermal, x86-pkg-temp: Fix CPU hotplug callback registration octeon, watchdog: Fix CPU hotplug callback registration oprofile, nmi-timer: Fix CPU hotplug callback registration intel-idle: Fix CPU hotplug callback registration clocksource, dummy-timer: Fix CPU hotplug callback registration drivers/base/topology.c: Fix CPU hotplug callback registration acpi-cpufreq: Fix CPU hotplug callback registration zsmalloc: Fix CPU hotplug callback registration scsi, fcoe: Fix CPU hotplug callback registration scsi, bnx2fc: Fix CPU hotplug callback registration scsi, bnx2i: Fix CPU hotplug callback registration ...
2014-03-20x86, vsyscall: Fix CPU hotplug callback registrationSrivatsa S. Bhat1-1/+5
Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the vsyscall code in x86 by using this latter form of callback registration. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-18x86, vdso: Make vsyscall_gtod_data handling x86 genericStefani Seibold1-45/+0
This patch move the vsyscall_gtod_data handling out of vsyscall_64.c into an additonal file vsyscall_gtod.c to make the functionality available for x86 32 bit kernel. It also adds a new vsyscall_32.c which setup the VVAR page. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-14x86: delete __cpuinit usage from all x86 filesPaul Gortmaker1-3/+3
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-16Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds1-51/+59
Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-10-02seccomp: Make syscall skipping and nr changes more consistentAndy Lutomirski1-51/+59
This fixes two issues that could cause incompatibility between kernel versions: - If a tracer uses SECCOMP_RET_TRACE to select a syscall number higher than the largest known syscall, emulate the unknown vsyscall by returning -ENOSYS. (This is unlikely to make a noticeable difference on x86-64 due to the way the system call entry works.) - On x86-64 with vsyscall=emulate, skipped vsyscalls were buggy. This updates the documentation accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Will Drewry <wad@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-09-24time: Convert x86_64 to using new update_vsyscallJohn Stultz1-19/+28
Switch x86_64 to using sub-ns precise vsyscall Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-09-24time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLDJohn Stultz1-1/+1
To help migrate archtectures over to the new update_vsyscall method, redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-09-24time: Move update_vsyscall definitions to timekeeper_internal.hJohn Stultz1-1/+1
Since users will need to include timekeeper_internal.h, move update_vsyscall definitions to timekeeper_internal.h. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-07-22Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-10/+7
Pull debug-for-linus git tree from Ingo Molnar. Fix up trivial conflict in arch/x86/kernel/cpu/perf_event_intel.c due to a printk() having changed to a pr_info() differently in the two branches. * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move call to print_modules() out of show_regs() x86/mm: Mark free_initrd_mem() as __init x86/microcode: Mark microcode_id[] as __initconst x86/nmi: Clean up register_nmi_handler() usage x86: Save cr2 in NMI in case NMIs take a page fault (for i386) x86: Remove cmpxchg from i386 NMI nesting code x86: Save cr2 in NMI in case NMIs take a page fault x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
2012-07-14vsyscall_64: add missing ifdef CONFIG_SECCOMPWill Drewry1-0/+4
vsyscall_seccomp introduced a dependency on __secure_computing. On configurations with CONFIG_SECCOMP disabled, compilation will fail. Reported-by: feng xiangjun <fengxj325@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13x86/vsyscall: allow seccomp filter in vsyscall=emulateWill Drewry1-4/+31
If a seccomp filter program is installed, older static binaries and distributions with older libc implementations (glibc 2.13 and earlier) that rely on vsyscall use will be terminated regardless of the filter program policy when executing time, gettimeofday, or getcpu. This is only the case when vsyscall emulation is in use (vsyscall=emulate is the default). This patch emulates system call entry inside a vsyscall=emulate by populating regs->ax and regs->orig_ax with the system call number prior to calling into seccomp such that all seccomp-dependencies function normally. Additionally, system call return behavior is emulated in line with other vsyscall entrypoints for the trace/trap cases. [ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ] Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-06x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>Joe Perches1-10/+7
Use a more current logging style: - Bare printks should have a KERN_<LEVEL> for consistency's sake - Add pr_fmt where appropriate - Neaten some macro definitions - Convert some Ok output to OK - Use "%s: ", __func__ in pr_fmt for summit - Convert some printks to pr_<level> Message output is not identical in all cases. Signed-off-by: Joe Perches <joe@perches.com> Cc: levinsasha928@gmail.com Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop [ merged two similar patches, tidied up the changelog ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-06x86: vsyscall: Use NULL instead 0 for a pointer argumentEmil Goode1-3/+3
This patch silences the following sparse warning: arch/x86/kernel/vsyscall_64.c:250:34: warning: Using plain integer as NULL pointer Signed-off-by: Emil Goode <emilgoode@gmail.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1333306084-3776-1-git-send-email-emilgoode@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-29Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull x86 cleanups from Peter Anvin: "The biggest textual change is the cleanup to use symbolic constants for x86 trap values. The only *functional* change and the reason for the x86/x32 dependency is the move of is_ia32_task() into <asm/thread_info.h> so that it can be used in other code that needs to understand if a system call comes from the compat entry point (and therefore uses i386 system call numbers) or not. One intended user for that is the BPF system call filter. Moving it out of <asm/compat.h> means we can define it unconditionally, returning always true on i386." * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h x86: Rename trap_no to trap_nr in thread_struct x86: Use enum instead of literals for trap values
2012-03-24x86: vdso: Put declaration before codeThomas Gleixner1-1/+2
Sigh, warnings are there for a reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: John Stultz <john.stultz@linaro.org>
2012-03-23x86-64: Simplify and optimize vdso clock_gettime monotonic variantsAndy Lutomirski1-1/+9
We used to store the wall-to-monotonic offset and the realtime base. It's faster to precompute the monotonic base. This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC. It's much more impressive for CLOCK_MONOTONIC_COARSE. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15x86: vdso: Use seqcount instead of seqlockThomas Gleixner1-8/+3
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15x86: vdso: Remove bogus locking in update_vsyscall_tz()Thomas Gleixner1-5/+0
Changing the sequence count in update_vsyscall_tz() is completely pointless. The vdso code copies the data unprotected. There is no point to change this as sys_tz is nowhere protected at all. See sys_gettimeofday(). Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-13x86: Rename trap_no to trap_nr in thread_structSrikar Dronamraju1-1/+1
There are precedences of trap number being referred to as trap_nr. However thread struct refers trap number as trap_no. Change it to trap_nr. Also use enum instead of left-over literals for trap values. This is pure cleanup, no functional change intended. Suggested-by: Ingo Molnar <mingo@eltu.hu> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com [ Fixed the math-emu build ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Default to vsyscall=emulateAndy Lutomirski1-1/+1
This essentially reverts: 2b666859ec32: x86: Default to vsyscall=native for now The ABI breakage should now be fixed by: commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Oct 20 08:48:19 2011 -0700 x86-64: Set siginfo and context on vsyscall emulation faults Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Set siginfo and context on vsyscall emulation faultsAndy Lutomirski1-8/+67
To make this work, we teach the page fault handler how to send signals on failed uaccess. This only works for user addresses (kernel addresses will never hit the page fault handler in the first place), so we need to generate signals for those separately. This gets the tricky case right: if the user buffer spans multiple pages and only the second page is invalid, we set cr2 and si_addr correctly. UML relies on this behavior to "fault in" pages as needed. We steal a bit from thread_info.uaccess_err to enable this. Before this change, uaccess_err was a 32-bit boolean value. This fixes issues with UML when vsyscall=emulate. Reported-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-31x86: fix implicit include of <linux/topology.h> in vsyscall_64Paul Gortmaker1-0/+1
In removing the presence of <linux/module.h> from some of the more common <linux/something.h> files, this implict include of <linux/topology.h> was uncovered. CC arch/x86/kernel/vsyscall_64.o arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’: arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’ Explicitly call it out so the cleanup can take place. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-11x86: Default to vsyscall=native for nowAdrian Bunk1-1/+1
This UML breakage: linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790 linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790 Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add vsyscall= parameter") - the vsyscall emulation code is not fully cooked yet as UML relies on some rather fragile SIGSEGV semantics. Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to vsyscall=native for now, this patch implements that. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Andrew Lutomirski <luto@mit.edu> Cc: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-10x86-64: Rework vsyscall emulation and add vsyscall= parameterAndy Lutomirski1-30/+49
There are three choices: vsyscall=native: Vsyscalls are native code that issues the corresponding syscalls. vsyscall=emulate (default): Vsyscalls are emulated by instruction fault traps, tested in the bad_area path. The actual contents of the vsyscall page is the same as the vsyscall=native case except that it's marked NX. This way programs that make assumptions about what the code in the page does will not be confused when they read that code. vsyscall=none: Trying to execute a vsyscall will segfault. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10x86: Remove unnecessary compile flag tweaks for vsyscall codeAndy Lutomirski1-3/+0
As of commit 98d0ac38ca7b1b7a552c9a2359174ff84decb600 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Jul 14 06:47:22 2011 -0400 x86-64: Move vread_tsc and vread_hpet into the vDSO user code no longer directly calls into code in arch/x86/kernel/, so we don't need compile flag hacks to make it safe. All vdso code is in the vdso directory now. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/835cd05a4c7740544d09723d6ba48f4406f9826c.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04x86-64: Add vsyscall:emulate_vsyscall trace eventAndy Lutomirski1-0/+6
Vsyscall emulation is slow, so make it easy to track down. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/cdaad7da946a80b200df16647c1700db3e1171e9.1312378163.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04x86-64: Add user_64bit_mode paravirt opAndy Lutomirski1-5/+1
Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14x86-64: Move vread_tsc and vread_hpet into the vDSOAndy Lutomirski1-1/+1
The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13clocksource: Replace vread with generic arch dataAndy Lutomirski1-1/+1
The vread field was bloating struct clocksource everywhere except x86_64, and I want to change the way this works on x86_64, so let's split it out into per-arch data. Cc: x86@kernel.org Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13x86-64: Improve vsyscall emulation CS and RIP handlingAndy Lutomirski1-20/+41
Three fixes here: - Send SIGSEGV if called from compat code or with a funny CS. - Don't BUG on impossible addresses. - Add a missing local_irq_disable. This patch also removes an unused variable. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-07x86-64: Emulate legacy vsyscallsAndy Lutomirski1-139/+122
There's a fair amount of code in the vsyscall page. It contains a syscall instruction (in the gettimeofday fallback) and who knows what will happen if an exploit jumps into the middle of some other code. Reduce the risk by replacing the vsyscalls with short magic incantations that cause the kernel to emulate the real vsyscalls. These incantations are useless if entered in the middle. This causes vsyscalls to be a little more expensive than real syscalls. Fortunately sensible programs don't use them. The only exception is time() which is still called by glibc through the vsyscall - but calling time() millions of times per second is not sensible. glibc has this fixed in the development tree. This patch is not perfect: the vread_tsc and vread_hpet functions are still at a fixed address. Fixing that might involve making alternative patching work in the vDSO. Signed-off-by: Andy Lutomirski <luto@mit.edu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-06x86-64: Remove vsyscall number 3 (venosys)Andy Lutomirski1-5/+0
It just segfaults since April 2008 (a4928cff), so I'm pretty sure that nothing uses it. And having an empty section makes the linker script a bit fragile. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/4a4abcf47ecadc269f2391a313576fe6d06acef7.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-05x86-64: Remove kernel.vsyscall64 sysctlAndy Lutomirski1-33/+1
It's unnecessary overhead in code that's supposed to be highly optimized. Removing it allows us to remove one of the two syscall instructions in the vsyscall page. The only sensible use for it is for UML users, and it doesn't fully address inconsistent vsyscall results on UML. The real fix for UML is to stop using vsyscalls entirely. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-05x86-64: Give vvars their own pageAndy Lutomirski1-0/+5
Move vvars out of the vsyscall page into their own page and mark it NX. Without this patch, an attacker who can force a daemon to call some fixed address could wait until the time contains, say, 0xCD80, and then execute the current time. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/b1460f81dc4463d66ea3f2b5ce240f58d48effec.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds1-26/+20
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: vdso: Remove unused variable x86-64: Optimize vDSO time() x86-64: Add time to vDSO x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO x86-64: Move vread_tsc into a new file with sensible options x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 x86-64: Don't generate cmov in vread_tsc x86-64: Remove unnecessary barrier in vread_tsc x86-64: Clean up vdso/kernel shared variables
2011-05-24seqlock: Get rid of SEQLOCK_UNLOCKEDEric Dumazet1-1/+1
All static seqlock should be initialized with the lockdep friendly __SEQLOCK_UNLOCKED() macro. Remove legacy SEQLOCK_UNLOCKED() macro. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Clean up vdso/kernel shared variablesAndy Lutomirski1-26/+20
Variables that are shared between the vdso and the kernel are currently a bit of a mess. They are each defined with their own magic, they are accessed differently in the kernel, the vsyscall page, and the vdso, and one of them (vsyscall_clock) doesn't even really exist. This changes them all to use a common mechanism. All of them are delcared in vvar.h with a fixed address (validated by the linker script). In the kernel (as before), they look like ordinary read-write variables. In the vsyscall page and the vdso, they are accessed through a new macro VVAR, which gives read-only access. The vdso is now loaded verbatim into memory without any fixups. As a side bonus, access from the vdso is faster because a level of indirection is removed. While we're at it, pack jiffies and vgetcpu_mode into the same cacheline. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27timkeeping: Fix update_vsyscall to provide wall_to_monotonic offsetJohn Stultz1-3/+3
update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27x86: Fix vtime/file timestamp inconsistenciesJohn Stultz1-3/+8
Due to vtime calling vgettimeofday(), its possible that an application could call time();create("stuff",O_RDRW); only to see the file's creation timestamp to be before the value returned by time. A similar way to reproduce the issue is to compare the vsyscall time() with the syscall time(), and observe ordering issues. The modified test case from Oleg Nesterov below can illustrate this: int main(void) { time_t sec1,sec2; do { sec1 = time(&sec2); sec2 = syscall(__NR_time, NULL); } while (sec1 <= sec2); printf("vtime: %d.000000\n", sec1); printf("time: %d.000000\n", sec2); return 0; } The proper fix is to make vtime use the same time value as current_kernel_time() (which is exported via update_vsyscall) instead of vgettime(). Thanks to Jiri Olsa for bringing up the issue and catching bugs in earlier verisons of this fix. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-03-01x86: Raise vsyscall priority on hotplug notifier chainSheng Yang1-1/+2
KVM need vsyscall_init() to initialize MSR_TSC_AUX before it read the value. Per Avi's suggestion, this patch raised vsyscall priority on hotplug notifier chain, to 30. CC: Ingo Molnar <mingo@elte.hu> CC: linux-kernel@vger.kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-08Merge branch 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds1-2/+3
* 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Fix /proc/timer_list regression itimers: Fix racy writes to cpu_itimer fields timekeeping: Fix clock_gettime vsyscall time warp
2009-11-17timekeeping: Fix clock_gettime vsyscall time warpLin Ming1-2/+3
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-12sysctl x86: Remove dead binary sysctl supportEric W. Biederman1-1/+1
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name and .strategy members of sysctl tables are dead code. Remove them. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan1-9/+1
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21time: Introduce CLOCK_REALTIME_COARSEjohn stultz1-0/+1
After talking with some application writers who want very fast, but not fine-grained timestamps, I decided to try to implement new clock_ids to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE which returns the time at the last tick. This is very fast as we don't have to access any hardware (which can be very painful if you're using something like the acpi_pm clocksource), and we can even use the vdso clock_gettime() method to avoid the syscall. The only trade off is you only get low-res tick grained time resolution. This isn't a new idea, I know Ingo has a patch in the -rt tree that made the vsyscall gettimeofday() return coarse grained time when the vsyscall64 sysctrl was set to 2. However this affects all applications on a system. With this method, applications can choose the proper speed/granularity trade-off for themselves. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: nikolag@ca.ibm.com Cc: Darren Hart <dvhltc@us.ibm.com> Cc: arjan@infradead.org Cc: jonathan@jonmasters.org LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-05-28x86: move rdtsc_barrier() into the TSC vread methodPetr Tesarik1-8/+0
The *fence instructions were moved to vsyscall_64.c by commit cb9e35dce94a1b9c59d46224e8a94377d673e204. But this breaks the vDSO, because vread methods are also called from there. Besides, the synchronization might be unnecessary for other time sources than TSC. [ Impact: fix potential time warp in VDSO ] Signed-off-by: Petr Tesarik <ptesarik@suse.cz> LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>