aboutsummaryrefslogtreecommitdiffstats
path: root/lib/lru_cache.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2012-04-19Yama: add additional ptrace scopesKees Cook2-12/+60
This expands the available Yama ptrace restrictions to include two more modes. Mode 2 requires CAP_SYS_PTRACE for PTRACE_ATTACH, and mode 3 completely disables PTRACE_ATTACH (and locks the sysctl). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-18seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTERWill Drewry1-4/+9
If both audit and seccomp filter support are disabled, 'ret' is marked as unused. If just seccomp filter support is disabled, data and skip are considered unused. This change fixes those build warnings. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-18seccomp: ignore secure_computing return valuesWill Drewry8-7/+14
This change is inspired by https://lkml.org/lkml/2012/4/16/14 which fixes the build warnings for arches that don't support CONFIG_HAVE_ARCH_SECCOMP_FILTER. In particular, there is no requirement for the return value of secure_computing() to be checked unless the architecture supports seccomp filter. Instead of silencing the warnings with (void) a new static inline is added to encode the expected behavior in a compiler and human friendly way. v2: - cleans things up with a static inline - removes sfr's signed-off-by since it is a different approach v1: - matches sfr's original change Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-17seccomp: use a static inline for a function stubStephen Rothwell1-1/+1
Fixes this error message when CONFIG_SECCOMP is not set: arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter': arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value] Signed-off-by: Stephen Rothwell <sfr@ozlabs.au.ibm.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14Documentation: prctl/seccomp_filterWill Drewry8-1/+875
Documents how system call filtering using Berkeley Packet Filter programs works and how it may be used. Includes an example for x86 and a semi-generic example using a macro-based code generator. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> v18: - added acked by - update no new privs numbers v17: - remove @compat note and add Pitfalls section for arch checking (keescook@chromium.org) v16: - v15: - v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - comment on the ptrace_event use - update arch support comment - note the behavior of SECCOMP_RET_DATA when there are multiple filters (keescook@chromium.org) - lots of samples/ clean up incl 64-bit bpf-direct support (markus@chromium.org) - rebase to linux-next v11: - overhaul return value language, updates (keescook@chromium.org) - comment on do_exit(SIGSYS) v10: - update for SIGSYS - update for new seccomp_data layout - update for ptrace option use v9: - updated bpf-direct.c for SIGILL v8: - add PR_SET_NO_NEW_PRIVS to the samples. v7: - updated for all the new stuff in v7: TRAP, TRACE - only talk about PR_SET_SECCOMP now - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com) - adds dropper.c: a simple system call disabler v6: - tweak the language to note the requirement of PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) v5: - update sample to use system call arguments - adds a "fancy" example using a macro-based generator - cleaned up bpf in the sample - update docs to mention arguments - fix prctl value (eparis@redhat.com) - language cleanup (rdunlap@xenotime.net) v4: - update for no_new_privs use - minor tweaks v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net) - document use of tentative always-unprivileged - guard sample compilation for i386 and x86_64 v2: - move code to samples (corbet@lwn.net) Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14x86: Enable HAVE_ARCH_SECCOMP_FILTERWill Drewry2-1/+7
Enable support for seccomp filter on x86: - syscall_get_arch() - syscall_get_arguments() - syscall_rollback() - syscall_set_return_value() - SIGSYS siginfo_t support - secure_computing is called from a ptrace_event()-safe context - secure_computing return value is checked (see below). SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to skip a system call without killing the process. This is done by returning a non-zero (-1) value from secure_computing. This change makes x86 respect that return value. To ensure that minimal kernel code is exposed, a non-zero return value results in an immediate return to user space (with an invalid syscall number). Signed-off-by: Will Drewry <wad@chromium.org> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: rebase and tweaked change description, acked-by v17: added reviewed by and rebased v..: all rebases since original introduction. Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14ptrace,seccomp: Add PTRACE_SECCOMP supportWill Drewry4-6/+26
This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP, and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE. When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit SECCOMP_RET_DATA mask of the BPF program return value will be passed as the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG. If the subordinate process is not using seccomp filter, then no system call notifications will occur even if the option is specified. If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE is returned, the system call will not be executed and an -ENOSYS errno will be returned to userspace. This change adds a dependency on the system call slow path. Any future efforts to use the system call fast path for seccomp filter will need to address this restriction. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - rebase - comment fatal_signal check - acked-by - drop secure_computing_int comment v17: - ... v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu) [note PT_TRACE_MASK disappears in linux-next] v15: - add audit support for non-zero return codes - clean up style (indan@nul.nu) v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc (Brings back a change to ptrace.c and the masks.) v12: - rebase to linux-next - use ptrace_event and update arch/Kconfig to mention slow-path dependency - drop all tracehook changes and inclusion (oleg@redhat.com) v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator (indan@nul.nu) v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP v9: - n/a v8: - guarded PTRACE_SECCOMP use with an ifdef v7: - introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14seccomp: Add SECCOMP_RET_TRAPWill Drewry4-6/+37
Adds a new return value to seccomp filters that triggers a SIGSYS to be delivered with the new SYS_SECCOMP si_code. This allows in-process system call emulation, including just specifying an errno or cleanly dumping core, rather than just dying. Suggested-by: Markus Gutschke <markus@chromium.org> Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - acked-by, rebase - don't mention secure_computing_int() anymore v15: - use audit_seccomp/skip - pad out error spacing; clean up switch (indan@nul.nu) v14: - n/a v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - rebase on to linux-next v11: - clarify the comment (indan@nul.nu) - s/sigtrap/sigsys v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig note suggested-by (though original suggestion had other behaviors) v9: - changes to SIGILL v8: - clean up based on changes to dependent patches v7: - introduction Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14signal, x86: add SIGSYS info and make it synchronous.Will Drewry4-1/+40
This change enables SIGSYS, defines _sigfields._sigsys, and adds x86 (compat) arch support. _sigsys defines fields which allow a signal handler to receive the triggering system call number, the relevant AUDIT_ARCH_* value for that number, and the address of the callsite. SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it to have setup_frame() called for it. The goal is to ensure that ucontext_t reflects the machine state from the time-of-syscall and not from another signal handler. The first consumer of SIGSYS would be seccomp filter. In particular, a filter program could specify a new return value, SECCOMP_RET_TRAP, which would result in the system call being denied and the calling thread signaled. This also means that implementing arch-specific support can be dependent upon HAVE_ARCH_SECCOMP_FILTER. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> v18: - added acked by, rebase v17: - rebase and reviewed-by addition v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - reworded changelog (oleg@redhat.com) v11: - fix dropped words in the change description - added fallback copy_siginfo support. - added __ARCH_SIGSYS define to allow stepped arch support. v10: - first version based on suggestion Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14seccomp: add SECCOMP_RET_ERRNOWill Drewry3-16/+42
This change adds the SECCOMP_RET_ERRNO as a valid return value from a seccomp filter. Additionally, it makes the first use of the lower 16-bits for storing a filter-supplied errno. 16-bits is more than enough for the errno-base.h calls. Returning errors instead of immediately terminating processes that violate seccomp policy allow for broader use of this functionality for kernel attack surface reduction. For example, a linux container could maintain a whitelist of pre-existing system calls but drop all new ones with errnos. This would keep a logically static attack surface while providing errnos that may allow for graceful failure without the downside of do_exit() on a bad call. This change also changes the signature of __secure_computing. It appears the only direct caller is the arm entry code and it clobbers any possible return value (register) immediately. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - fix up comments and rebase - fix bad var name which was fixed in later revs - remove _int() and just change the __secure_computing signature v16-v17: ... v15: - use audit_seccomp and add a skip label. (eparis@redhat.com) - clean up and pad out return codes (indan@nul.nu) v14: - no change/rebase v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - move to WARN_ON if filter is NULL (oleg@redhat.com, luto@mit.edu, keescook@chromium.org) - return immediately for filter==NULL (keescook@chromium.org) - change evaluation to only compare the ACTION so that layered errnos don't result in the lowest one being returned. (keeschook@chromium.org) v11: - check for NULL filter (keescook@chromium.org) v10: - change loaders to fn v9: - n/a v8: - update Kconfig to note new need for syscall_set_return_value. - reordered such that TRAP behavior follows on later. - made the for loop a little less indent-y v7: - introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14seccomp: remove duplicated failure loggingKees Cook3-20/+11
This consolidates the seccomp filter error logging path and adds more details to the audit log. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: make compat= permanent in the record v15: added a return code to the audit_seccomp path by wad@chromium.org (suggested by eparis@redhat.com) v*: original by keescook@chromium.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14seccomp: add system call filtering using BPFWill Drewry6-23/+472
[This patch depends on luto@mit.edu's no_new_privs patch: https://lkml.org/lkml/2012/1/30/264 The whole series including Andrew's patches can be found here: https://github.com/redpig/linux/tree/seccomp Complete diff here: https://github.com/redpig/linux/compare/1dc65fed...seccomp ] This patch adds support for seccomp mode 2. Mode 2 introduces the ability for unprivileged processes to install system call filtering policy expressed in terms of a Berkeley Packet Filter (BPF) program. This program will be evaluated in the kernel for each system call the task makes and computes a result based on data in the format of struct seccomp_data. A filter program may be installed by calling: struct sock_fprog fprog = { ... }; ... prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog); The return value of the filter program determines if the system call is allowed to proceed or denied. If the first filter program installed allows prctl(2) calls, then the above call may be made repeatedly by a task to further reduce its access to the kernel. All attached programs must be evaluated before a system call will be allowed to proceed. Filter programs will be inherited across fork/clone and execve. However, if the task attaching the filter is unprivileged (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This ensures that unprivileged tasks cannot attach filters that affect privileged tasks (e.g., setuid binary). There are a number of benefits to this approach. A few of which are as follows: - BPF has been exposed to userland for a long time - BPF optimization (and JIT'ing) are well understood - Userland already knows its ABI: system call numbers and desired arguments - No time-of-check-time-of-use vulnerable data accesses are possible. - system call arguments are loaded on access only to minimize copying required for system call policy decisions. Mode 2 support is restricted to architectures that enable HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on syscall_get_arguments(). The full desired scope of this feature will add a few minor additional requirements expressed later in this series. Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be the desired additional functionality. No architectures are enabled in this patch. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Indan Zupancic <indan@nul.nu> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: - rebase to v3.4-rc2 - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org) - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu) - add a comment for get_u32 regarding endianness (akpm@) - fix other typos, style mistakes (akpm@) - added acked-by v17: - properly guard seccomp filter needed headers (leann@ubuntu.com) - tighten return mask to 0x7fff0000 v16: - no change v15: - add a 4 instr penalty when counting a path to account for seccomp_filter size (indan@nul.nu) - drop the max insns to 256KB (indan@nul.nu) - return ENOMEM if the max insns limit has been hit (indan@nul.nu) - move IP checks after args (indan@nul.nu) - drop !user_filter check (indan@nul.nu) - only allow explicit bpf codes (indan@nul.nu) - exit_code -> exit_sig v14: - put/get_seccomp_filter takes struct task_struct (indan@nul.nu,keescook@chromium.org) - adds seccomp_chk_filter and drops general bpf_run/chk_filter user - add seccomp_bpf_load for use by net/core/filter.c - lower max per-process/per-hierarchy: 1MB - moved nnp/capability check prior to allocation (all of the above: indan@nul.nu) v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com) - removed copy_seccomp (keescook@chromium.org,indan@nul.nu) - reworded the prctl_set_seccomp comment (indan@nul.nu) v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com) - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu) - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu) - pare down Kconfig doc reference. - extra comment clean up v10: - seccomp_data has changed again to be more aesthetically pleasing (hpa@zytor.com) - calling convention is noted in a new u32 field using syscall_get_arch. This allows for cross-calling convention tasks to use seccomp filters. (hpa@zytor.com) - lots of clean up (thanks, Indan!) v9: - n/a v8: - use bpf_chk_filter, bpf_run_filter. update load_fns - Lots of fixes courtesy of indan@nul.nu: -- fix up load behavior, compat fixups, and merge alloc code, -- renamed pc and dropped __packed, use bool compat. -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch dependencies v7: (massive overhaul thanks to Indan, others) - added CONFIG_HAVE_ARCH_SECCOMP_FILTER - merged into seccomp.c - minimal seccomp_filter.h - no config option (part of seccomp) - no new prctl - doesn't break seccomp on systems without asm/syscall.h (works but arg access always fails) - dropped seccomp_init_task, extra free functions, ... - dropped the no-asm/syscall.h code paths - merges with network sk_run_filter and sk_chk_filter v6: - fix memory leak on attach compat check failure - require no_new_privs || CAP_SYS_ADMIN prior to filter installation. (luto@mit.edu) - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) - cleaned up Kconfig (amwang@redhat.com) - on block, note if the call was compat (so the # means something) v5: - uses syscall_get_arguments (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) - uses union-based arg storage with hi/lo struct to handle endianness. Compromises between the two alternate proposals to minimize extra arg shuffling and account for endianness assuming userspace uses offsetof(). (mcgrathr@chromium.org, indan@nul.nu) - update Kconfig description - add include/seccomp_filter.h and add its installation - (naive) on-demand syscall argument loading - drop seccomp_t (eparis@redhat.com) v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS - now uses current->no_new_privs (luto@mit.edu,torvalds@linux-foundation.com) - assign names to seccomp modes (rdunlap@xenotime.net) - fix style issues (rdunlap@xenotime.net) - reworded Kconfig entry (rdunlap@xenotime.net) v3: - macros to inline (oleg@redhat.com) - init_task behavior fixed (oleg@redhat.com) - drop creator entry and extra NULL check (oleg@redhat.com) - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) - adds tentative use of "always_unprivileged" as per torvalds@linux-foundation.org and luto@mit.edu v2: - (patch 2 only) Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14arch/x86: add syscall_get_arch to syscall.hWill Drewry1-0/+27
Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call entry path. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: - update comment about x32 tasks - rebase to v3.4-rc2 v17: rebase and reviewed-by v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14asm/syscall.h: add syscall_get_archWill Drewry1-0/+14
Adds a stub for a function that will return the AUDIT_ARCH_* value appropriate to the supplied task based on the system call convention. For audit's use, the value can generally be hard-coded at the audit-site. However, for other functionality not inlined into syscall entry/exit, this makes that information available. seccomp_filter is the first planned consumer and, as such, the comment indicates a tie to CONFIG_HAVE_ARCH_SECCOMP_FILTER. Suggested-by: Roland McGrath <mcgrathr@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Paris <eparis@redhat.com> v18: comment and change reword and rebase. v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v11: fixed improper return type v10: introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14seccomp: kill the seccomp_t typedefWill Drewry2-5/+7
Replaces the seccomp_t typedef with struct seccomp to match modern kernel style. Signed-off-by: Will Drewry <wad@chromium.org> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Paris <eparis@redhat.com> v18: rebase ... v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v8-v11: no changes v7: struct seccomp_struct -> struct seccomp v6: original inclusion in this series. Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14net/compat.c,linux/filter.h: share compat_sock_fprogWill Drewry2-8/+11
Any other users of bpf_*_filter that take a struct sock_fprog from userspace will need to be able to also accept a compat_sock_fprog if the arch supports compat calls. This change allows the existing compat_sock_fprog be shared. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> v18: tasered by the apostrophe police v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v11: introduction Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14sk_run_filter: add BPF_S_ANC_SECCOMP_LD_WWill Drewry2-0/+7
Introduces a new BPF ancillary instruction that all LD calls will be mapped through when skb_run_filter() is being used for seccomp BPF. The rewriting will be done using a secondary chk_filter function that is run after skb_chk_filter. The code change is guarded by CONFIG_SECCOMP_FILTER which is added, along with the seccomp_bpf_load() function later in this series. This is based on http://lkml.org/lkml/2012/3/2/141 Suggested-by: Indan Zupancic <indan@nul.nu> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> v18: rebase ... v15: include seccomp.h explicitly for when seccomp_bpf_load exists. v14: First cut using a single additional instruction ... v13: made bpf functions generic. Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVSJohn Johansen1-4/+35
Add support for AppArmor to explicitly fail requested domain transitions if NO_NEW_PRIVS is set and the task is not unconfined. Transitions from unconfined are still allowed because this always results in a reduction of privileges. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> v18: new acked-by, new description Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privsAndy Lutomirski8-4/+55
With this change, calling prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) disables privilege granting operations at execve-time. For example, a process will not be able to execute a setuid binary to change their uid or gid if this bit is set. The same is true for file capabilities. Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that LSMs respect the requested behavior. To determine if the NO_NEW_PRIVS bit is set, a task may call prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); It returns 1 if set and 0 if it is not set. If any of the arguments are non-zero, it will return -1 and set errno to -EINVAL. (PR_SET_NO_NEW_PRIVS behaves similarly.) This functionality is desired for the proposed seccomp filter patch series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the system call behavior for itself and its child tasks without being able to impact the behavior of a more privileged task. Another potential use is making certain privileged operations unprivileged. For example, chroot may be considered "safe" if it cannot affect privileged tasks. Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is set and AppArmor is in use. It is fixed in a subsequent patch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> v18: updated change desc v17: using new define values as per 3.4 Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-09maintainers: update wiki url for the security subsystemJames Morris1-1/+1
Update the wiki url for the security subsystem to: http://kernsec.org/ Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-09maintainers: add kernel/capability.c to capabilities entryJames Morris1-0/+1
Add kernel/capability.c to capabilities entry. Reported-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-07Linux 3.4-rc2Linus Torvalds1-1/+1
2012-04-06tcm_fc: Do not free tpg structure during wq allocation failureMark Rustad1-5/+8
Avoid freeing a registered tpg structure if an alloc_workqueue call fails. This fixes a bug where the failure was leaking memory associated with se_portal_group setup during the original core_tpg_register() call. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Acked-by: Kiran Patil <Kiran.patil@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-06tcm_fc: Add abort flag for gracefully handling exchange timeoutMark Rustad3-2/+11
Add abort flag and use it to terminate processing when an exchange is timed out or is reset. The abort flag is used in place of the transport_generic_free_cmd function call in the reset and timeout cases, because calling that function in that context would free memory that was in use. The aborted flag allows the lifetime to be managed in a more normal way, while truncating the processing. This change eliminates a source of memory corruption which manifested in a variety of ugly ways. (nab: Drop unused struct fc_exch *ep in ft_recv_seq) Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Acked-by: Kiran Patil <Kiran.patil@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-06MAINTAINERS: Update git url for ACPIIgor Murzov1-1/+1
Signed-off-by: Igor Murzov <e-mail@date.by> Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06Make the "word-at-a-time" helper functions more commonly usableLinus Torvalds2-32/+49
I have a new optimized x86 "strncpy_from_user()" that will use these same helper functions for all the same reasons the name lookup code uses them. This is preparation for that. This moves them into an architecture-specific header file. It's architecture-specific for two reasons: - some of the functions are likely to want architecture-specific implementations. Even if the current code happens to be "generic" in the sense that it should work on any little-endian machine, it's likely that the "multiply by a big constant and shift" implementation is less than optimal for an architecture that has a guaranteed fast bit count instruction, for example. - I expect that if architectures like sparc want to start playing around with this, we'll need to abstract out a few more details (in particular the actual unaligned accesses). So we're likely to have more architecture-specific stuff if non-x86 architectures start using this. (and if it turns out that non-x86 architectures don't start using this, then having it in an architecture-specific header is still the right thing to do, of course) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-06cpuidle: Fix panic in CPU off-lining with no idle driverToshi Kani1-1/+4
Fix a NULL pointer dereference panic in cpuidle_play_dead() during CPU off-lining when no cpuidle driver is registered. A cpuidle driver may be registered at boot-time based on CPU type. This patch allows an off-lined CPU to enter HLT-based idle in this condition. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: Boris Ostrovsky <boris.ostrovsky@amd.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06xen/pcifront: avoid pci_frontend_enable_msix() falsely returning successJan Beulich1-0/+1
The original XenoLinux code has always had things this way, and for compatibility reasons (in particular with a subsequent pciback adjustment) upstream Linux should behave the same way (allowing for two distinct error indications to be returned by the backend). Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/pciback: fix XEN_PCI_OP_enable_msix resultJan Beulich1-1/+1
Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a positive value to indicate the number of vectors (less than the amount requested) that can be set up for a given device. Returning this as an operation value (secondary result) is fine, but (primary) operation results are expected to be negative (error) or zero (success) according to the protocol. With the frontend fixed to match the XenoLinux behavior, the backend can now validly return zero (success) here, passing the upper limit on the number of vectors in op->value. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/smp: Remove unnecessary call to smp_processor_id()Srivatsa S. Bhat1-1/+1
There is an extra and unnecessary call to smp_processor_id() in cpu_bringup(). Remove it. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'Konrad Rzeszutek Wilk1-1/+3
The above mentioned patch checks the IOAPIC and if it contains -1, then it unmaps said IOAPIC. But under Xen we get this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 PGD 0 Oops: 0002 [#1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron 1525 /0U990C RIP: e030:[<ffffffff8134e51f>] [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 RSP: e02b: ffff8800d42cbb70 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001 RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001 RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010 FS: 0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0:000000008005003b CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000) Stack: 00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157 ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384 0000000000000002 0000000000000010 00000000ffffffff 0000000000000000 Call Trace: [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230 [<ffffffff8100a9b2>] ? check_events+0x12+0x20 [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0 [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0 [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30 [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20 [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202 [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40 [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70 [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0 [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20 The reason we are dying is b/c the call acpi_get_override_irq() is used, which returns the polarity and trigger for the IRQs. That function calls mp_find_ioapics to get the 'struct ioapic' structure - which along with the mp_irq[x] is used to figure out the default values and the polarity/trigger overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt. The proper fix for this is going in v3.5 and adds an x86_io_apic_ops struct so that platforms can override it. But for v3.4 lets carry this work-around. This patch does that by providing a slightly different variant of the fake IOAPIC entries. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen: only check xen_platform_pci_unplug if hvmIgor Mammedov2-2/+2
commit b9136d207f08 xen: initialize platform-pci even if xen_emul_unplug=never breaks blkfront/netfront by not loading them because of xen_platform_pci_unplug=0 and it is never set for PV guest. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06net: fix a race in sock_queue_err_skb()Eric Dumazet1-1/+3
As soon as an skb is queued into socket error queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06netlink: fix races after skb queueingEric Dumazet1-11/+13
As soon as an skb is queued into socket receive_queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06doc, net: Update ndo_start_xmit return type and valuesBen Hutchings1-10/+12
Commit dc1f8bf68b311b1537cb65893430b6796118498a ('netdev: change transmit to limited range type') changed the required return type and 9a1654ba0b50402a6bd03c7b0fe9b0200a5ea7b1 ('net: Optimize hard_start_xmit() return checking') changed the valid numerical return values. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06doc, net: Remove instruction to set net_device::trans_startBen Hutchings1-5/+2
Commit 08baf561083bc27a953aa087dd8a664bb2b88e8e ('net: txq_trans_update() helper') made it unnecessary for most drivers to set net_device::trans_start (or netdev_queue::trans_start). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06doc, net: Update netdev operation namesBen Hutchings2-14/+14
Commits d314774cf2cd5dfeb39a00d37deee65d4c627927 ('netdev: network device operations infrastructure') and 008298231abbeb91bc7be9e8b078607b816d1a4a ('netdev: add more functions to netdevice ops') moved and renamed net device operation pointers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06doc, net: Update documentation of synchronisation for TX multiqueueBen Hutchings1-3/+3
Commits e308a5d806c852f56590ffdd3834d0df0cbed8d7 ('netdev: Add netdev->addr_list_lock protection.') and e8a0464cc950972824e2e128028ae3db666ec1ed ('netdev: Allocate multiple queues for TX.') introduced more fine-grained locks. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06doc, net: Remove obsolete reference to dev->pollBen Hutchings1-2/+1
Commit bea3348eef27e6044b6161fd04c3152215f96411 ('[NET]: Make NAPI polling independent of struct net_device objects.') removed the automatic disabling of NAPI polling by dev_close(), and drivers must now do this themselves. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06ethtool: Remove exception to the requirement of holding RTNL lockBen Hutchings1-2/+1
Commit e52ac3398c3d772d372b9b62ab408fd5eec96840 ('net: Use device model to get driver name in skb_gso_segment()') removed the only in-tree caller of ethtool ops that doesn't hold the RTNL lock. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06blackfin: update defconfig for bf527-ezkitBob Liu1-0/+1
To fix compile error: drivers/usb/musb/blackfin.h:51:3: error: #error "Please use PIO mode in MUSB driver on bf52x chip v0.0 and v0.1" make[4]: *** [drivers/usb/musb/blackfin.o] Error 1 Signed-off-by: Bob Liu <lliubbo@gmail.com>
2012-04-06blackfin: gpio: fix compile error if !CONFIG_GPIOLIBBob Liu1-2/+12
Add __gpio_get_value()/__gpio_set_value() to fix compile error if CONFIG_GPIOLIB = n. Signed-off-by: Bob Liu <lliubbo@gmail.com>
2012-04-06blackfin: fix L1 data A overflow link issueMike Frysinger1-1/+1
This patch fix below compile error: "bfin-uclinux-ld: L1 data A overflow!" It is due to the recent lib/gen_crc32table.c change: 46c5801eaf86e83cb3a4142ad35188db5011fff0 crc32: bolt on crc32c it added 8KiB more data to __cacheline_aligned which cause blackfin L1 data cache overflow. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Bob Liu <lliubbo@gmail.com>
2012-04-05mmc: use really long write timeout to deal with crappy cardsPaul Walmsley1-3/+7
Several people have noticed that crappy SD cards take much longer to complete multiple block writes than the 300ms that Linux specifies. Try to work around this by using a three second write timeout instead. This is a generalized version of a patch from Chase Maupin <Chase.Maupin@ti.com>, whose patch description said: * With certain SD cards timeouts like the following have been seen due to an improper calculation of the dto value: mmcblk0: error -110 transferring data, sector 4126233, nr 8, card status 0xc00 * By removing the dto calculation and setting the timeout value to the maximum specified by the SD card specification part A2 section 2.2.15 these timeouts can be avoided. * This change has been used by beagleboard users as well as the Texas Instruments SDK without a negative impact. * There are multiple discussion threads about this but the most relevant ones are: * http://talk.maemo.org/showthread.php?p=1000707#post1000707 * http://www.mail-archive.com/linux-omap@vger.kernel.org/msg42213.html * Original proposal for this fix was done by Sukumar Ghoral of Texas Instruments * Tested using a Texas Instruments AM335x EVM Signed-off-by: Paul Walmsley <paul@pwsan.com> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05mmc: sdhci-dove: Fix compile error by including module.hAlf Høgemark1-0/+1
This patch fixes a compile error in drivers/mmc/host/sdhci-dove.c by including the linux/module.h file. Signed-off-by: Alf Høgemark <alf@i100.no> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05mmc: Prevent 1.8V switch for SD hosts that don't support UHS modes.Al Cooper1-2/+3
The driver should not try to switch to 1.8V when the SD 3.0 host controller does not have any UHS capabilities bits set (SDR50, DDR50 or SDR104). See page 72 of "SD Specifications Part A2 SD Host Controller Simplified Specification Version 3.00" under "1.8V Signaling Enable". Instead of setting SDR12 and SDR25 in the host capabilities data structure for all V3.0 host controllers, only set them if SDR104, SDR50 or DDR50 is set in the host capabilities register. This will prevent the switch to 1.8V later. Signed-off-by: Al Cooper <acooper@gmail.com> Acked-by: Arindam Nath <arindam.nath@amd.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Girish K S <girish.shivananjappa@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05Revert "mmc: sdhci-pci: Add MSI support"Chris Ball1-6/+0
This reverts commit e6039832bed9a9b967796d7021f17f25b625b616. There are reports of MSI breaking SDHCI on multiple chipsets (JMicron and O2Micro, at least), so this should be reverted until we come up with a whitelist or something. Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05Revert "mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers"Chris Ball2-5/+1
This reverts commit c16e981b2fd9455af670a69a84f4c8cf07e12658, because it's no longer useful once MSI support is reverted. Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05mmc: core: fix power class selectionSubhash Jadavani1-13/+17
mmc_select_powerclass() function returns error if eMMC VDD level supported by host is between 2.7v to 3.2v. According to eMMC specification, valid voltage for high voltage cards is 2.7v to 3.6v. This patch ensures that 2.7v to 3.6v VDD range is treated as valid range. Also, failure to set the power class shouldn't be treated as fatal error because even if setting the power class fails, card can still work in default power class. If mmc_select_powerclass() returns error, just print the warning message and go ahead with rest of the card initialization. Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org> Acked-by: Girish K S <girish.shivananjappa@linaro.org> Reviewed-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-05mmc: omap_hsmmc: fix module re-insertionBalaji T K1-3/+1
OMAP4 and OMAP3 HSMMC IP registers differ by 0x100 offset. Adding the offset to platform_device resource structure increments the start address for every insmod operation. MMC command fails on re-insertion as module due to incorrect register base. Fix this by updating the ioremap base address only. Signed-off-by: Balaji T K <balajitk@ti.com> Signed-off-by: Venkatraman S <svenkatr@ti.com> Signed-off-by: Chris Ball <cjb@laptop.org>