linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-08-29	Fix a bidirectional UDS connect check typo	Lukasz Pawelczyk	1	-2/+2
	The 54e70ec5eb090193b03e69d551fa6771a5a217c4 commit introduced a bidirectional check that should have checked for mutual WRITE access between two labels. Due to a typo the second check was incorrect. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
2014-08-29	Small fixes in comments describing function parameters	Lukasz Pawelczyk	1	-9/+9
	Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
2014-08-28	Smack: Bring-up access mode	Casey Schaufler	5	-27/+294
	People keep asking me for permissive mode, and I keep saying "no". Permissive mode is wrong for more reasons than I can enumerate, but the compelling one is that it's once on, never off. Nonetheless, there is an argument to be made for running a process with lots of permissions, logging which are required, and then locking the process down. There wasn't a way to do that with Smack, but this provides it. The notion is that you start out by giving the process an appropriate Smack label, such as "ATBirds". You create rules with a wide range of access and the "b" mode. On Tizen it might be: ATBirds System rwxalb ATBirds User rwxalb ATBirds _ rwxalb User ATBirds wb System ATBirds wb Accesses that fail will generate audit records. Accesses that succeed because of rules marked with a "b" generate log messages identifying the rule, the program and as much object information as is convenient. When the system is properly configured and the programs brought in line with the labeling scheme the "b" mode can be removed from the rules. When the system is ready for production the facility can be configured out. This provides the developer the convenience of permissive mode without creating a system that looks like it is enforcing a policy while it is not. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-25	Smack: Fix setting label on successful file open	Marcin Niesluchowski	1	-1/+3
	While opening with CAP_MAC_OVERRIDE file label is not set. Other calls may access it after CAP_MAC_OVERRIDE is dropped from process. Signed-off-by: Marcin Niesluchowski <m.niesluchow@samsung.com>
2014-08-08	Smack: remove unneeded NULL-termination from securtity label	Konstantin Khlebnikov	1	-3/+3
	Values of extended attributes are stored as binary blobs. NULL-termination of them isn't required. It just wastes disk space and confuses command-line tools like getfattr because they have to print that zero byte at the end. This patch removes terminating zero byte from initial security label in smack_inode_init_security and cuts it out in function smack_inode_getsecurity which is used by syscall getxattr. This change seems completely safe, because function smk_parse_smack ignores everything after first zero byte. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
2014-08-08	Smack: handle zero-length security labels without panic	Konstantin Khlebnikov	2	-3/+3
	Zero-length security labels are invalid but kernel should handle them. This patch fixes kernel panic after setting zero-length security labels: # attr -S -s "SMACK64" -V "" file And after writing zero-length string into smackfs files syslog and onlycp: # python -c 'import os; os.write(1, "")' > /smack/syslog The problem is caused by brain-damaged logic in function smk_parse_smack() which takes pointer to buffer and its length but if length below or equal zero it thinks that the buffer is zero-terminated. Unfortunately callers of this function are widely used and proper fix requires serious refactoring. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
2014-08-08	Smack: fix behavior of smack_inode_listsecurity	Konstantin Khlebnikov	1	-5/+4
	Security operation ->inode_listsecurity is used for generating list of available extended attributes for syscall listxattr. Currently it's used only in nfs4 or if filesystem doesn't provide i_op->listxattr. The list is the set of NULL-terminated names, one after the other. This method must include zero byte at the and into result. Also this function must return length even if string does not fit into output buffer or it is NULL, see similar method in selinux and man listxattr. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
2014-08-03	X.509: Need to export x509_request_asymmetric_key()	David Howells	1	-0/+1
	Need to export x509_request_asymmetric_key() so that PKCS#7 can use it if compiled as a module. Reported-by: James Morris <jmorris@namei.org> Signed-off-by: David Howells <dhowells@redhat.com>
2014-08-01	netlabel: shorter names for the NetLabel catmap funcs/structs	Paul Moore	8	-157/+139
	Historically the NetLabel LSM secattr catmap functions and data structures have had very long names which makes a mess of the NetLabel code and anyone who uses NetLabel. This patch renames the catmap functions and structures from "_secattr_catmap_" to just "_catmap_" which improves things greatly. There are no substantial code or logic changes in this patch. Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01	netlabel: fix the catmap walking functions	Paul Moore	1	-48/+54
	The two NetLabel LSM secattr catmap walk functions didn't handle certain edge conditions correctly, causing incorrect security labels to be generated in some cases. This patch corrects these problems and converts the functions to use the new _netlbl_secattr_catmap_getnode() function in order to reduce the amount of repeated code. Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01	netlabel: fix the horribly broken catmap functions	Paul Moore	5	-146/+240
	The NetLabel secattr catmap functions, and the SELinux import/export glue routines, were broken in many horrible ways and the SELinux glue code fiddled with the NetLabel catmap structures in ways that we probably shouldn't allow. At some point this "worked", but that was likely due to a bit of dumb luck and sub-par testing (both inflicted by yours truly). This patch corrects these problems by basically gutting the code in favor of something less obtuse and restoring the NetLabel abstractions in the SELinux catmap glue code. Everything is working now, and if it decides to break itself in the future this code will be much easier to debug than the code it replaces. One noteworthy side effect of the changes is that it is no longer necessary to allocate a NetLabel catmap before calling one of the NetLabel APIs to set a bit in the catmap. NetLabel will automatically allocate the catmap nodes when needed, resulting in less allocations when the lowest bit is greater than 255 and less code in the LSMs. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01	netlabel: fix a problem when setting bits below the previously lowest bit	Paul Moore	4	-16/+26
	The NetLabel category (catmap) functions have a problem in that they assume categories will be set in an increasing manner, e.g. the next category set will always be larger than the last. Unfortunately, this is not a valid assumption and could result in problems when attempting to set categories less than the startbit in the lowest catmap node. In some cases kernel panics and other nasties can result. This patch corrects the problem by checking for this and allocating a new catmap node instance and placing it at the front of the list. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-07-31	PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1	David Howells	1	-4/+2
	X.509 certificate issuer and subject fields are mandatory fields in the ASN.1 and so their existence needn't be tested for. They are guaranteed to end up with an empty string if the name material has nothing we can use (see x509_fabricate_name()). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-07-29	tpm: simplify code by using %*phN specifier	Andy Shevchenko	1	-3/+1
	Instead of looping by ourselves we may use %*phN specifier to dump a small buffer. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ PHuewe: removed now unused variable i ] Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-07-29	tpm: Provide a generic means to override the chip returned timeouts	Jason Gunthorpe	3	-21/+75
	Some Atmel TPMs provide completely wrong timeouts from their TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns new correct values via a DID/VID table in the TIS driver. Tested on ARM using an AT97SC3204T FW version 37.16 Cc: <stable@vger.kernel.org> [PHuewe: without this fix these 'broken' Atmel TPMs won't function on older kernels] Signed-off-by: "Berg, Christopher" <Christopher.Berg@atmel.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-07-29	tpm: missing tpm_chip_put in tpm_get_random()	Jarkko Sakkinen	1	-3/+4
	Regression in 41ab999c. Call to tpm_chip_put is missing. This will cause TPM device driver not to unload if tmp_get_random() is called. Cc: <stable@vger.kernel.org> # 3.7+ Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-07-29	tpm: Properly clean sysfs entries in error path	Stefan Berger	1	-1/+3
	Properly clean the sysfs entries in the error path Cc: <stable@vger.kernel.org> Reported-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-07-29	tpm: Add missing tpm_do_selftest to ST33 I2C driver	Jason Gunthorpe	1	-0/+1
	Most device drivers do call 'tpm_do_selftest' which executes a TPM_ContinueSelfTest. tpm_i2c_stm_st33 is just pointlessly different, I think it is bug. These days we have the general assumption that the TPM is usable by the kernel immediately after the driver is finished, so we can no longer defer the mandatory self test to userspace. Cc: <stable@vger.kernel.org> # 3.12+ Reported-by: Richard Marciel <rmaciel@linux.vnet.ibm.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-07-29	PKCS#7: Use x509_request_asymmetric_key()	David Howells	3	-72/+29
	pkcs7_request_asymmetric_key() and x509_request_asymmetric_key() do the same thing, the latter being a copy of the former created by the IMA folks, so drop the PKCS#7 version as the X.509 location is more general. Whilst we're at it, rename the arguments of x509_request_asymmetric_key() to better reflect what the values being passed in are intended to match on an X.509 cert. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-07-28	Revert "selinux: fix the default socket labeling in sock_graft()"	Paul Moore	2	-15/+3
	This reverts commit 4da6daf4d3df5a977e4623963f141a627fd2efce. Unfortunately, the commit in question caused problems with Bluetooth devices, specifically it caused them to get caught in the newly created BUG_ON() check. The AF_ALG problem still exists, but will be addressed in a future patch. Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-07-28	X.509: x509_request_asymmetric_keys() doesn't need string length arguments	David Howells	1	-6/+3
	x509_request_asymmetric_keys() doesn't need the lengths of the NUL-terminated strings passing in as it can work that out for itself. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-07-28	PKCS#7: fix sparse non static symbol warning	Wei Yongjun	1	-1/+1
	Fixes the following sparse warnings: crypto/asymmetric_keys/pkcs7_key_type.c:73:17: warning: symbol 'key_type_pkcs7' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David Howells <dhowells@redhat.com>
2014-07-28	KEYS: revert encrypted key change	Mimi Zohar	1	-1/+1
	Commit fc7c70e "KEYS: struct key_preparsed_payload should have two payload pointers" erroneously modified encrypted-keys. This patch reverts the change to that file. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com>
2014-07-25	ima: add support for measuring and appraising firmware	Mimi Zohar	8	-5/+50
	The "security: introduce kernel_fw_from_file hook" patch defined a new security hook to evaluate any loaded firmware that wasn't built into the kernel. This patch defines ima_fw_from_file(), which is called from the new security hook, to measure and/or appraise the loaded firmware's integrity. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2014-07-25	firmware_class: perform new LSM checks	Kees Cook	1	-4/+26
	This attaches LSM hooks to the existing firmware loading interfaces: filesystem-found firmware and demand-loaded blobs. On errors, loads are aborted and the failure code is returned to userspace. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Takashi Iwai <tiwai@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-25	security: introduce kernel_fw_from_file hook	Kees Cook	3	-0/+29
	In order to validate the contents of firmware being loaded, there must be a hook to evaluate any loaded firmware that wasn't built into the kernel itself. Without this, there is a risk that a root user could load malicious firmware designed to mount an attack against kernel memory (e.g. via DMA). Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Takashi Iwai <tiwai@suse.de>
2014-07-25	PKCS#7: Missing inclusion of linux/err.h	David Howells	1	-0/+1
	crypto/asymmetric_keys/pkcs7_key_type.c needs to #include linux/err.h rather than relying on getting it through other headers. Without this, the powerpc allyesconfig build fails. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David Howells <dhowells@redhat.com>
2014-07-24	CAPABILITIES: remove undefined caps from all processes	Eric Paris	5	-12/+13
	This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744 plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-07-24	commoncap: don't alloc the credential unless needed in cap_task_prctl	Tetsuo Handa	1	-42/+30
	In function cap_task_prctl(), we would allocate a credential unconditionally and then check if we support the requested function. If not we would release this credential with abort_creds() by using RCU method. But on some archs such as powerpc, the sys_prctl is heavily used to get/set the floating point exception mode. So the unnecessary allocating/releasing of credential not only introduce runtime overhead but also do cause OOM due to the RCU implementation. This patch removes abort_creds() from cap_task_prctl() by calling prepare_creds() only when we need to modify it. Reported-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Paul Moore <paul@paul-moore.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-07-22	KEYS: request_key_auth: Provide key preparsing	David Howells	1	-0/+13
	Provide key preparsing for the request_key_auth key type so that we can make preparsing mandatory. This does nothing as this type can only be set up internally to the kernel. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>
2014-07-22	KEYS: keyring: Provide key preparsing	David Howells	1	-11/+23
	Provide key preparsing in the keyring so that we can make preparsing mandatory. For keyrings, however, only an empty payload is permitted. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>
2014-07-22	KEYS: big_key: Use key preparsing	David Howells	2	-17/+27
	Make use of key preparsing in the big key type so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com>
2014-07-22	KEYS: RxRPC: Use key preparsing	David Howells	1	-68/+97
	Make use of key preparsing in the RxRPC protocol so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com>
2014-07-22	KEYS: DNS: Use key preparsing	David Howells	1	-18/+25
	Make use of key preparsing in the DNS resolver so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>
2014-07-22	KEYS: Ceph: Use user_match()	David Howells	1	-6/+2
	Ceph can use user_match() instead of defining its own identical function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2014-07-22	KEYS: Ceph: Use key preparsing	David Howells	1	-9/+15
	Make use of key preparsing in Ceph so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2014-07-22	KEYS: user: Use key preparsing	David Howells	3	-22/+30
	Make use of key preparsing in user-defined and logon keys so that quota size determination can take place prior to keyring locking when a key is being added. Also the idmapper key types need to change to match as they use the user-defined key type routines. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>
2014-07-22	KEYS: Call ->free_preparse() even after ->preparse() returns an error	David Howells	2	-6/+7
	Call the ->free_preparse() key type op even after ->preparse() returns an error as it does cleaning up type stuff. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-22	KEYS: Allow expiry time to be set when preparsing a key	David Howells	3	-3/+16
	Allow a key type's preparsing routine to set the expiry time for a key. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-22	KEYS: struct key_preparsed_payload should have two payload pointers	David Howells	5	-6/+8
	struct key_preparsed_payload should have two payload pointers to correspond with those in struct key. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-18	seccomp: implement SECCOMP_FILTER_FLAG_TSYNC	Kees Cook	4	-2/+140
	Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	seccomp: allow mode setting across threads	Kees Cook	1	-11/+25
	This changes the mode setting helper to allow threads to change the seccomp mode from another thread. We must maintain barriers to keep TIF_SECCOMP synchronized with the rest of the seccomp state. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	seccomp: introduce writer locking	Kees Cook	3	-5/+66
	Normally, task_struct.seccomp.filter is only ever read or modified by the task that owns it (current). This property aids in fast access during system call filtering as read access is lockless. Updating the pointer from another task, however, opens up race conditions. To allow cross-thread filter pointer updates, writes to the seccomp fields are now protected by the sighand spinlock (which is shared by all threads in the thread group). Read access remains lockless because pointer updates themselves are atomic. However, writes (or cloning) often entail additional checking (like maximum instruction counts) which require locking to perform safely. In the case of cloning threads, the child is invisible to the system until it enters the task list. To make sure a child can't be cloned from a thread and left in a prior state, seccomp duplication is additionally moved under the sighand lock. Then parent and child are certain have the same seccomp state when they exit the lock. Based on patches by Will Drewry and David Drysdale. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	seccomp: split filter prep from check and apply	Kees Cook	1	-30/+67
	In preparation for adding seccomp locking, move filter creation away from where it is checked and applied. This will allow for locking where no memory allocation is happening. The validation, filter attachment, and seccomp mode setting can all happen under the future locks. For extreme defensiveness, I've added a BUG_ON check for the calculated size of the buffer allocation in case BPF_MAXINSN ever changes, which shouldn't ever happen. The compiler should actually optimize out this check since the test above it makes it impossible. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	sched: move no_new_privs into new atomic flags	Kees Cook	5	-10/+22
	Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	MIPS: add seccomp syscall	Kees Cook	5	-6/+13
	Wires up the new seccomp syscall. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
2014-07-18	ARM: add seccomp syscall	Kees Cook	2	-0/+2
	Wires up the new seccomp syscall. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
2014-07-18	seccomp: add "seccomp" syscall	Kees Cook	8	-6/+65
	This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	seccomp: split mode setting routines	Kees Cook	1	-23/+48
	Separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-07-18	seccomp: extract check/assign mode helpers	Kees Cook	1	-4/+18
	To support splitting mode 1 from mode 2, extract the mode checking and assignment logic into common functions. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>