linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2013-03-27	userns: Restrict when proc and sysfs can be mounted	Eric W. Biederman	1	-0/+21
	Only allow unprivileged mounts of proc and sysfs if they are already mounted when the user namespace is created. proc and sysfs are interesting because they have content that is per namespace, and so fresh mounts are needed when new namespaces are created while at the same time proc and sysfs have content that is shared between every instance. Respect the policy of who may see the shared content of proc and sysfs by only allowing new mounts if there was an existing mount at the time the user namespace was created. In practice there are only two interesting cases: proc and sysfs are mounted at their usual places, proc and sysfs are not mounted at all (some form of mount namespace jail). Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27	vfs: Carefully propogate mounts across user namespaces	Eric W. Biederman	1	-1/+5
	As a matter of policy MNT_READONLY should not be changable if the original mounter had more privileges than creator of the mount namespace. Add the flag CL_UNPRIVILEGED to note when we are copying a mount from a mount namespace that requires more privileges to a mount namespace that requires fewer privileges. When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT if any of the mnt flags that should never be changed are set. This protects both mount propagation and the initial creation of a less privileged mount namespace. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27	vfs: Add a mount flag to lock read only bind mounts	Eric W. Biederman	1	-0/+3
	When a read-only bind mount is copied from mount namespace in a higher privileged user namespace to a mount namespace in a lesser privileged user namespace, it should not be possible to remove the the read-only restriction. Add a MNT_LOCK_READONLY mount flag to indicate that a mount must remain read-only. CC: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27	userns: Don't allow creation if the user is chrooted	Eric W. Biederman	1	-0/+24
	Guarantee that the policy of which files may be access that is established by setting the root directory will not be violated by user namespaces by verifying that the root directory points to the root of the mount namespace at the time of user namespace creation. Changing the root is a privileged operation, and as a matter of policy it serves to limit unprivileged processes to files below the current root directory. For reasons of simplicity and comprehensibility the privilege to change the root directory is gated solely on the CAP_SYS_CHROOT capability in the user namespace. Therefore when creating a user namespace we must ensure that the policy of which files may be access can not be violated by changing the root directory. Anyone who runs a processes in a chroot and would like to use user namespace can setup the same view of filesystems with a mount namespace instead. With this result that this is not a practical limitation for using user namespaces. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-22	new helper: file_inode(file)	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22	mount: consolidate permission checks	Al Viro	1	-33/+7
	... and ask for global CAP_SYS_ADMIN only for superblock-level remounts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22	get rid of unprotected dereferencing of mnt->mnt_ns	Al Viro	1	-12/+17
	It's safe only under namespace_sem or vfsmount_lock; all places in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in there). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20	vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags	Miao Xie	1	-1/+1
	The compiler may optimize the while loop and make the check just be done once, so we should use ACCESS_ONCE() to guard access to ->mnt_flags Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-14	userns: Require CAP_SYS_ADMIN for most uses of setns.	Eric W. Biederman	1	-1/+2
	Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER \| CLONE_NEWNS); if (pid == 0) { /* child / system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { / parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20	proc: Usable inode numbers for the namespace file descriptors.	Eric W. Biederman	1	-0/+14
	Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19	userns: fix return value on mntns_install() failure	Zhao Hongjiang	1	-1/+1
	Change return value from -EINVAL to -EPERM when the permission check fails. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19	vfs: Allow unprivileged manipulation of the mount namespace.	Eric W. Biederman	1	-26/+43
	- Add a filesystem flag to mark filesystems that are safe to mount as an unprivileged user. - Add a filesystem flag to mark filesystems that don't need MNT_NODEV when mounted by an unprivileged user. - Relax the permission checks to allow unprivileged users that have CAP_SYS_ADMIN permissions in the user namespace referred to by the current mount namespace to be allowed to mount, unmount, and move filesystems. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19	vfs: Only support slave subtrees across different user namespaces	Eric W. Biederman	1	-3/+8
	Sharing mount subtress with mount namespaces created by unprivileged users allows unprivileged mounts created by unprivileged users to propagate to mount namespaces controlled by privileged users. Prevent nasty consequences by changing shared subtrees to slave subtress when an unprivileged users creates a new mount namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19	vfs: Add a user namespace reference from struct mnt_namespace	Eric W. Biederman	1	-8/+16
	This will allow for support for unprivileged mounts in a new user namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19	vfs: Add setns support for the mount namespace	Eric W. Biederman	1	-0/+95
	setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-10-12	vfs: define struct filename and have getname() return it	Jeff Layton	1	-2/+2
	getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-11	consitify do_mount() arguments	Al Viro	1	-6/+6
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-22	do_add_mount()/umount -l races	Al Viro	1	-2/+8
	normally we deal with lock_mount()/umount races by checking that mountpoint to be is still in our namespace after lock_mount() has been done. However, do_add_mount() skips that check when called with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The reason is that ->mnt_ns may be a temporary namespace created exactly to contain automounts a-la NFS4 referral handling. It's not the namespace of the caller, though, so check_mnt() would fail here. We still need to check that ->mnt_ns is non-NULL in that case, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31	fs: Add freezing handling to mnt_want_write() / mnt_drop_write()	Jan Kara	1	-20/+77
	Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14	VFS: Comment mount following code	David Howells	1	-2/+14
	Add comments describing what the directions "up" and "down" mean and ref count handling to the VFS mount following family of functions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14	VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors	David Howells	1	-55/+65
	copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14	get rid of magic in proc_namespace.c	Al Viro	1	-3/+3
	don't rely on proc_mounts->m being the first field; container_of() is there for purpose. No need to bother with ->private, while we are at it - the same container_of will do nicely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14	get rid of ->mnt_longterm	Al Viro	1	-46/+7
	it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30	vfs: umount_tree() might be called on subtree that had never made it	Al Viro	1	-1/+2
	__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29	brlocks/lglocks: API cleanups	Andi Kleen	1	-69/+70
	lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	Linus Torvalds	1	-2/+0
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-06	vfs: prevent remount read-only if pending removes	Miklos Szeredi	1	-0/+7
	If there are any inodes on the super block that have been unlinked (i_nlink == 0) but have not yet been deleted then prevent the remounting the super block read-only. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06	vfs: protect remounting superblock read-only	Miklos Szeredi	1	-1/+39
	Currently remouting superblock read-only is racy in a major way. With the per mount read-only infrastructure it is now possible to prevent most races, which this patch attempts. Before starting the remount read-only, iterate through all mounts belonging to the superblock and if none of them have any pending writes, set sb->s_readonly_remount. This indicates that remount is in progress and no further write requests are allowed. If the remount succeeds set MS_RDONLY and reset s_readonly_remount. If the remounting is unsuccessful just reset s_readonly_remount. This can result in transient EROFS errors, despite the fact the remount failed. Unfortunately hodling off writes is difficult as remount itself may touch the filesystem (e.g. through load_nls()) which would deadlock. A later patch deals with delayed writes due to nlink going to zero. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06	vfs: keep list of mounts for each superblock	Miklos Szeredi	1	-0/+7
	Keep track of vfsmounts belonging to a superblock. List is protected by vfsmount_lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06	vfs: switch ->show_options() to struct dentry *	Al Viro	1	-2/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: trim includes a bit	Al Viro	1	-19/+6
	[folded fix for missing magic.h from Tetsuo Handa] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	switch mnt_namespace ->root to struct mount	Al Viro	1	-6/+6
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: take /proc/*/mounts and friends to fs/proc_namespace.c	Al Viro	1	-211/+7
	rationale: that stuff is far tighter bound to fs/namespace.c than to the guts of procfs proper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: opencode mntget() mnt_set_mountpoint()	Al Viro	1	-1/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - remaining argument of next_mnt()	Al Viro	1	-17/+18
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: move fsnotify junk to struct mount	Al Viro	1	-23/+22
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: move mnt_devname	Al Viro	1	-9/+9
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: move mnt_list to struct mount	Al Viro	1	-23/+24
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: switch pnode.h macros to struct mount *	Al Viro	1	-21/+21
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: move the rest of int fields to struct mount	Al Viro	1	-15/+17
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: mnt_id/mnt_group_id moved	Al Viro	1	-15/+15
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: mnt_ns moved to struct mount	Al Viro	1	-22/+23
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - mntput_no_expire	Al Viro	1	-6/+7
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - do_add_mount and graft_tree	Al Viro	1	-11/+11
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount	Al Viro	1	-20/+21
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: and now we can make ->mnt_master point to struct mount	Al Viro	1	-2/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: take mnt_master to struct mount	Al Viro	1	-5/+5
	make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - remaining argument of mnt_set_mountpoint()	Al Viro	1	-4/+4
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - propagate_mnt()	Al Viro	1	-6/+6
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03	vfs: spread struct mount - get_dominating_id / do_make_slave	Al Viro	1	-1/+1
	next pile of horrors, similar to mnt_parent one; this time it's mnt_master. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>