diff options
author | 2025-02-25 08:02:21 +0100 | |
---|---|---|
committer | 2025-03-04 09:29:54 +0100 | |
commit | 53dde6b6b2557be949f1fc91b99ff534b697ed21 (patch) | |
tree | 0ce6ca07af18ac178bce92406616519de6aeb3f4 | |
parent | Merge patch series "CONFIG_DEBUG_VFS at last" (diff) | |
parent | selftests: seventh test for mounting detached mounts onto detached mounts (diff) | |
download | wireguard-linux-53dde6b6b2557be949f1fc91b99ff534b697ed21.tar.xz wireguard-linux-53dde6b6b2557be949f1fc91b99ff534b697ed21.zip |
Merge patch series "fs: expand abilities of anonymous mount namespaces"
Christian Brauner <brauner@kernel.org> says:
This series expands the abilities of anonymous mount namespaces.
Terminology
===========
detached mount:
A detached mount is a mount belonging to an anonymous mount namespace.
anonymous mount namespace:
An anonymous mount namespace is a mount namespace that does not appear
in nsfs. This means neither can it be setns()ed into nor can it be
persisted through bind-mounts.
attached mount:
An attached mount is a mount belonging to a non-anonymous mount
namespace.
non-anonymous mount namespace:
A non-anonymous mount namespace is a mount namespace that does appear
in nsfs. This means it can be setns()ed into and can be persisted
through bind-mounts.
mount namespace sequence number:
Each non-anonymous mount namespace has a unique 64bit sequence number
that is assigned when the mount namespace is created. The sequence
number uniquely identifies a non-anonymous mount namespace.
One of the purposes of the sequence number is to prevent mount
namespace loops. These can occur when an nsfs mount namespace file is
bind mounted into a mount namespace that was created after it.
Such loops are prevented by verifying that the sequence number of the
target mount namespace is smaller than the sequence number of the nsfs
mount namespace file. In other words, the target mount namespace must
have been created before the nsfs file mount namespace.
In contrast, anonymous mount namespaces don't have a sequence number
assigned. Anonymous mount namespaces do not appear in any nsfs
instances and can thus not be pinned by bind-mounting them anywhere.
They can thus not be used to form cycles.
Creating detached mounts from detached mounts
=============================================
Currently, detached mounts can only be created from attached mounts.
This limitaton prevents various use-cases. For example, the ability to
mount a subdirectory without ever having to make the whole filesystem
visible first.
The current permission model for OPEN_TREE_CLONE flag of the open_tree()
system call is:
(1) Check that the caller is privileged over the owning user namespace
of it's current mount namespace.
(2) Check that the caller is located in the mount namespace of the mount
it wants to create a detached copy of.
While it is not strictly necessary to do it this way it is consistently
applied in the new mount api. This model will also be used when allowing
the creation of detached mount from another detached mount.
The (1) requirement can simply be met by performing the same check as
for the non-detached case, i.e., verify that the caller is privileged
over its current mount namespace.
To meet the (2) requirement it must be possible to infer the origin
mount namespace that the anonymous mount namespace of the detached mount
was created from.
The origin mount namespace of an anonymous mount is the mount namespace
that the mounts that were copied into the anonymous mount namespace
originate from.
In order to check the origin mount namespace of an anonymous mount
namespace two methods come to mind:
(i) stash a reference to the original mount namespace in the anonymous
mount namespace
(ii) record the sequence number of the original mount namespace in the
anonymous mount namespace
The (i) option has more complicated consequences and implications than
(ii). For example, it would pin the origin mount namespace. Even with a
passive reference it would pointlessly pin memory as access to the
origin mount namespace isn't required.
With (ii) in place it is possible to perform an equivalent check (2') to
(2). The origin mount namespace of the anonymous mount namespace must be
the same as the caller's mount namespace. To establish this the sequence
number of the caller's mount namespace and the origin sequence number of
the anonymous mount namespace are compared.
The caller is always located in a non-anonymous mount namespace since
anonymous mount namespaces cannot be setns()ed into. The caller's mount
namespace will thus always have a valid sequence number.
The owning namespace of any mount namespace, anonymous or non-anonymous,
can never change. A mount attached to a non-anonymous mount namespace
can never change mount namespace.
If the sequence number of the non-anonymous mount namespace and the
origin sequence number of the anonymous mount namespace match, the
owning namespaces must match as well.
Hence, the capability check on the owning namespace of the caller's
mount namespace ensures that the caller has the ability to copy the
mount tree.
Mounting detached mounts onto detached mounts
=============================================
Currently, detached mounts can only be mounted onto attached mounts.
This limitation makes it impossible to assemble a new private rootfs and
move it into place. Instead, a detached tree must be created, attached,
then mounted open and then either moved or detached again. Lift this
restriction.
In order to allow mounting detached mounts onto other detached mounts
the same permission model used for creating detached mounts from
detached mounts can be used (cf. above).
Allowing to mount detached mounts onto detached mounts leaves three
cases to consider:
(1) The source mount is an attached mount and the target mount is a
detached mount. This would be equivalent to moving a mount between
different mount namespaces. A caller could move an attached mount to
a detached mount. The detached mount can now be freely attached to
any mount namespace. This changes the current delegatioh model
significantly for no good reason. So this will fail.
(2) Anonymous mount namespaces are always attached fully, i.e., it is
not possible to only attach a subtree of an anoymous mount
namespace. This simplifies the implementation and reasoning.
Consequently, if the anonymous mount namespace of the source
detached mount and the target detached mount are the identical the
mount request will fail.
(3) The source mount's anonymous mount namespace is different from the
target mount's anonymous mount namespace.
In this case the source anonymous mount namespace of the source
mount tree must be freed after its mounts have been moved to the
target anonymous mount namespace. The source anonymous mount
namespace must be empty afterwards.
By allowing to mount detached mounts onto detached mounts a caller may
do the following:
fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)
fd_tree1 and fd_tree2 refer to two different detached mount trees that
belong to two different anonymous mount namespace.
It is important to note that fd_tree1 and fd_tree2 both refer to the
root of their respective anonymous mount namespaces.
By allowing to mount detached mounts onto detached mounts the caller
may now do:
move_mount(fd_tree1, "", fd_tree2, "",
MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)
This will cause the detached mount referred to by fd_tree1 to be
mounted on top of the detached mount referred to by fd_tree2.
Thus, the detached mount fd_tree1 is moved from its separate anonymous
mount namespace into fd_tree2's anonymous mount namespace.
It also means that while fd_tree2 continues to refer to the root of
its respective anonymous mount namespace fd_tree1 doesn't anymore.
This has the consequence that only fd_tree2 can be moved to another
anonymous or non-anonymous mount namespace. Moving fd_tree1 will now
fail as fd_tree1 doesn't refer to the root of an anoymous mount
namespace anymore.
Now fd_tree1 and fd_tree2 refer to separate detached mount trees
referring to the same anonymous mount namespace.
This is conceptually fine. The new mount api does allow for this to
happen already via:
mount -t tmpfs tmpfs /mnt
mkdir -p /mnt/A
mount -t tmpfs tmpfs /mnt/A
fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)
Both fd_tree3 and fd_tree4 refer to two different detached mount trees
but both detached mount trees refer to the same anonymous mount
namespace. An as with fd_tree1 and fd_tree2, only fd_tree3 may be
moved another mount namespace as fd_tree3 refers to the root of the
anonymous mount namespace just while fd_tree4 doesn't.
However, there's an important difference between the fd_tree3/fd_tree4
and the fd_tree1/fd_tree2 example.
Closing fd_tree4 and releasing the respective struct file will have no
further effect on fd_tree3's detached mount tree.
However, closing fd_tree3 will cause the mount tree and the respective
anonymous mount namespace to be destroyed causing the detached mount
tree of fd_tree4 to be invalid for further mounting.
By allowing to mount detached mounts on detached mounts as in the
fd_tree1/fd_tree2 example both struct files will affect each other.
Both fd_tree1 and fd_tree2 refer to struct files that have
FMODE_NEED_UNMOUNT set.
When either one of them is closed it ends up unmounting the mount
tree. The problem is that both will unconditionally free the mount
namespace and may end up causing UAFs for each other.
Another problem stems from the fact that fd_tree1 doesn't refer to the
root of the anonymous mount namespace. So ignoring the UAF issue, if
fd_tree2 were to be closed after fd_tree1, then fd_tree1 would free
only a part of the mount tree while leaking the rest of the mount
tree.
Multiple solutions for this problem come to mind:
(1) Reference Counting Anonymous Mount Namespaces
A solution to this problem would be reference counting anonymous
mount namespaces. The source detached mount tree acquires a
reference when it is moved into the anonymous mount namespace of
the target mount tree. When fd_tree1 is closed the mount tree
isn't unmounted and the anonymous mount namespace shared between
the detached mount tree at fd_tree1 and fd_tree2 isn't freed.
However, this has another problem. When fd_tree2 is closed before
fd_tree1 then closing fd_tree1 will cause the mount tree to be
unmounted and the anonymous mount namespace to be destroyed.
However, fd_tree1 only refers to a part of the mounts that the
shared anonymous mount namespace has collected. So this would leak
mounts.
(2) Removing FMODE_NEED_UNMOUNT from the struct file of the source
detached mount tree
In the current state of the mount api the creation of two file
descriptors that refer to different detached mount trees but to
the same anonymous mount namespace is already possible. See the
fd_tree3/fd_tree4 examples above.
In those cases only one of the two file descriptors will actually
end up unmounting and destroying the detached mount tree.
Whether or not a struct file needs to unmount and destroy an
anonymous mount namespace is governed by the FMODE_NEED_UNMOUNT
flag. In the fd_tree3/fd_tree4 example above only fd_tree3 will
refer to a struct file that has FMODE_NEED_UNMOUNT set.
A similar solution would work for mounting detached mounts onto
detached mounts. When the source detached mount tree is moved to
the target detached mount tree and thus from the source anonymous
mount namespace to the target anonymous mount namespace the
FMODE_NEED_UNMOUNT flag will be removed from the struct file of
the source detached mount tree.
In the above example the FMODE_NEED_UNMOUNT would be removed from
the struct file that fd_tree1 refers to.
This requires that the source file descriptor fd_tree1 needs to be
kept open until move_mount() is finished so that FMODE_NEED_UNMOUNT
can be removed:
move_mount(fd_tree1, "", fd_tree2, "",
MOVE_MOUNT_F_EMPTY_PATH |
MOVE_MOUNT_T_EMPTY_PATH)
/*
* Remove FMODE_NEED_UNMOUNT so closing fd_tree1 will leave the
* mount tree alone.
*/
close(fd_tree1);
/*
* Remove the whole mount tree including all the mounts that
* were moved from fd_tree1 into fd_tree2.
*/
close(fd_tree2);
Since the source detached mount tree fd_tree1 has now become an
attached mount tree, i.e., fd_tree1_mnt->mnt_parent == fd_tree2_mnt
is is ineligible for attaching again as move_mount() requires that a
detached mount tree can only be attached if it is the root of an
anonymous mount namespace.
Removing FMODE_NEED_UNMOUNT doesn't require to hold @namespace_sem.
Attaching @fd_tree1 to @fd_tree2 requires holding @namespace_sem and
so does dissolve_on_fput() should @fd_tree2 have been closed
concurrently.
While removing FMODE_NEED_UNMOUNT can be done it would require some
ugly hacking similar to what's done for splice to remove
FMODE_NOWAIT. That's ugly.
(3) Use the fact that @fd_tree1 will have a parent mount once it has
been attached to @fd_tree2.
When dissolve_on_fput() is called the mount that has been passed in
will refer to the root of the anonymous mount namespace. If it
doesn't it would mean that mounts are leaked. So before allowing to
mount detached mounts onto detached mounts this would be a bug.
Now that detached mounts can be mounted onto detached mounts it just
means that the mount has been attached to another anonymous mount
namespace and thus dissolve_on_fput() must not unmount the mount
tree or free the anonymous mount namespace as the file referring to
the root of the namespace hasn't been closed yet.
If it had been closed yet it would be obvious because the mount
namespace would be NULL, i.e., the @fd_tree1 would have already been
unmounted. If @fd_tree1 hasn't been unmounted yet and has a parent
mount it is safe to skip any cleanup as closing @fd_tree2 will take
care of all cleanup operations.
Imho, (3) is the cleanest solution and thus has been chosen.
* patches from https://lore.kernel.org/r/20250221-brauner-open_tree-v1-0-dbcfcb98c676@kernel.org:
selftests: seventh test for mounting detached mounts onto detached mounts
selftests: sixth test for mounting detached mounts onto detached mounts
selftests: fifth test for mounting detached mounts onto detached mounts
selftests: fourth test for mounting detached mounts onto detached mounts
selftests: third test for mounting detached mounts onto detached mounts
selftests: second test for mounting detached mounts onto detached mounts
selftests: first test for mounting detached mounts onto detached mounts
fs: mount detached mounts onto detached mounts
fs: support getname_maybe_null() in move_mount()
selftests: create detached mounts from detached mounts
fs: create detached mounts from detached mounts
fs: add may_copy_tree()
fs: add fastpath for dissolve_on_fput()
fs: add assert for move_mount()
fs: add mnt_ns_empty() helper
fs: record sequence number of origin mount namespace
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-0-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
-rw-r--r-- | fs/mount.h | 6 | ||||
-rw-r--r-- | fs/namespace.c | 348 | ||||
-rw-r--r-- | include/linux/fs.h | 1 | ||||
-rw-r--r-- | tools/testing/selftests/mount_setattr/mount_setattr_test.c | 591 |
4 files changed, 886 insertions, 60 deletions
diff --git a/fs/mount.h b/fs/mount.h index ffb613cdfeee..e2501a724688 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -20,6 +20,7 @@ struct mnt_namespace { wait_queue_head_t poll; struct rcu_head mnt_ns_rcu; }; + u64 seq_origin; /* Sequence number of origin mount namespace */ u64 event; unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; @@ -156,6 +157,11 @@ static inline bool mnt_ns_attached(const struct mount *mnt) return !RB_EMPTY_NODE(&mnt->mnt_node); } +static inline bool mnt_ns_empty(const struct mnt_namespace *ns) +{ + return RB_EMPTY_ROOT(&ns->mounts); +} + static inline void move_from_ns(struct mount *mnt, struct list_head *dt_list) { struct mnt_namespace *ns = mnt->mnt_ns; diff --git a/fs/namespace.c b/fs/namespace.c index a3ed3f2980cb..7f7ae1b2ce3a 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -998,6 +998,12 @@ static inline int check_mnt(struct mount *mnt) return mnt->mnt_ns == current->nsproxy->mnt_ns; } +static inline bool check_anonymous_mnt(struct mount *mnt) +{ + return is_anon_ns(mnt->mnt_ns) && + mnt->mnt_ns->seq_origin == current->nsproxy->mnt_ns->seq; +} + /* * vfsmount lock must be held for write */ @@ -2246,22 +2252,75 @@ struct vfsmount *collect_mounts(const struct path *path) static void free_mnt_ns(struct mnt_namespace *); static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *, bool); +static inline bool must_dissolve(struct mnt_namespace *mnt_ns) +{ + /* + * This mount belonged to an anonymous mount namespace + * but was moved to a non-anonymous mount namespace and + * then unmounted. + */ + if (unlikely(!mnt_ns)) + return false; + + /* + * This mount belongs to a non-anonymous mount namespace + * and we know that such a mount can never transition to + * an anonymous mount namespace again. + */ + if (!is_anon_ns(mnt_ns)) { + /* + * A detached mount either belongs to an anonymous mount + * namespace or a non-anonymous mount namespace. It + * should never belong to something purely internal. + */ + VFS_WARN_ON_ONCE(mnt_ns == MNT_NS_INTERNAL); + return false; + } + + return true; +} + void dissolve_on_fput(struct vfsmount *mnt) { struct mnt_namespace *ns; - namespace_lock(); - lock_mount_hash(); - ns = real_mount(mnt)->mnt_ns; - if (ns) { - if (is_anon_ns(ns)) - umount_tree(real_mount(mnt), UMOUNT_CONNECTED); - else - ns = NULL; + struct mount *m = real_mount(mnt); + + scoped_guard(rcu) { + if (!must_dissolve(READ_ONCE(m->mnt_ns))) + return; } - unlock_mount_hash(); - namespace_unlock(); - if (ns) - free_mnt_ns(ns); + + scoped_guard(rwsem_write, &namespace_sem) { + ns = m->mnt_ns; + if (!must_dissolve(ns)) + return; + + /* + * After must_dissolve() we know that this is a detached + * mount in an anonymous mount namespace. + * + * Now when mnt_has_parent() reports that this mount + * tree has a parent, we know that this anonymous mount + * tree has been moved to another anonymous mount + * namespace. + * + * So when closing this file we cannot unmount the mount + * tree. This will be done when the file referring to + * the root of the anonymous mount namespace will be + * closed (It could already be closed but it would sync + * on @namespace_sem and wait for us to finish.). + */ + if (mnt_has_parent(m)) + return; + + lock_mount_hash(); + umount_tree(m, UMOUNT_CONNECTED); + unlock_mount_hash(); + } + + /* Make sure we notice when we leak mounts. */ + VFS_WARN_ON_ONCE(!mnt_ns_empty(ns)); + free_mnt_ns(ns); } void drop_collected_mounts(struct vfsmount *mnt) @@ -2424,6 +2483,7 @@ int count_mounts(struct mnt_namespace *ns, struct mount *mnt) enum mnt_tree_flags_t { MNT_TREE_MOVE = BIT(0), MNT_TREE_BENEATH = BIT(1), + MNT_TREE_PROPAGATION = BIT(2), }; /** @@ -2773,6 +2833,71 @@ static int do_change_type(struct path *path, int ms_flags) return err; } +/* may_copy_tree() - check if a mount tree can be copied + * @path: path to the mount tree to be copied + * + * This helper checks if the caller may copy the mount tree starting + * from @path->mnt. The caller may copy the mount tree under the + * following circumstances: + * + * (1) The caller is located in the mount namespace of the mount tree. + * This also implies that the mount does not belong to an anonymous + * mount namespace. + * (2) The caller tries to copy an nfs mount referring to a mount + * namespace, i.e., the caller is trying to copy a mount namespace + * entry from nsfs. + * (3) The caller tries to copy a pidfs mount referring to a pidfd. + * (4) The caller is trying to copy a mount tree that belongs to an + * anonymous mount namespace. + * + * For that to be safe, this helper enforces that the origin mount + * namespace the anonymous mount namespace was created from is the + * same as the caller's mount namespace by comparing the sequence + * numbers. + * + * This is not strictly necessary. The current semantics of the new + * mount api enforce that the caller must be located in the same + * mount namespace as the mount tree it interacts with. Using the + * origin sequence number preserves these semantics even for + * anonymous mount namespaces. However, one could envision extending + * the api to directly operate across mount namespace if needed. + * + * The ownership of a non-anonymous mount namespace such as the + * caller's cannot change. + * => We know that the caller's mount namespace is stable. + * + * If the origin sequence number of the anonymous mount namespace is + * the same as the sequence number of the caller's mount namespace. + * => The owning namespaces are the same. + * + * ==> The earlier capability check on the owning namespace of the + * caller's mount namespace ensures that the caller has the + * ability to copy the mount tree. + * + * Returns true if the mount tree can be copied, false otherwise. + */ +static inline bool may_copy_tree(struct path *path) +{ + struct mount *mnt = real_mount(path->mnt); + const struct dentry_operations *d_op; + + if (check_mnt(mnt)) + return true; + + d_op = path->dentry->d_op; + if (d_op == &ns_dentry_operations) + return true; + + if (d_op == &pidfs_dentry_operations) + return true; + + if (!is_mounted(path->mnt)) + return false; + + return check_anonymous_mnt(mnt); +} + + static struct mount *__do_loopback(struct path *old_path, int recurse) { struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt); @@ -2780,13 +2905,8 @@ static struct mount *__do_loopback(struct path *old_path, int recurse) if (IS_MNT_UNBINDABLE(old)) return mnt; - if (!check_mnt(old)) { - const struct dentry_operations *d_op = old_path->dentry->d_op; - - if (d_op != &ns_dentry_operations && - d_op != &pidfs_dentry_operations) - return mnt; - } + if (!may_copy_tree(old_path)) + return mnt; if (!recurse && has_locked_children(old, old_path->dentry)) return mnt; @@ -2853,15 +2973,30 @@ out: static struct file *open_detached_copy(struct path *path, bool recursive) { - struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns; - struct mnt_namespace *ns = alloc_mnt_ns(user_ns, true); + struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns; + struct user_namespace *user_ns = mnt_ns->user_ns; struct mount *mnt, *p; struct file *file; + ns = alloc_mnt_ns(user_ns, true); if (IS_ERR(ns)) return ERR_CAST(ns); namespace_lock(); + + /* + * Record the sequence number of the source mount namespace. + * This needs to hold namespace_sem to ensure that the mount + * doesn't get attached. + */ + if (is_mounted(path->mnt)) { + src_mnt_ns = real_mount(path->mnt)->mnt_ns; + if (is_anon_ns(src_mnt_ns)) + ns->seq_origin = src_mnt_ns->seq_origin; + else + ns->seq_origin = src_mnt_ns->seq; + } + mnt = __do_loopback(path, recursive); if (IS_ERR(mnt)) { namespace_unlock(); @@ -3320,8 +3455,56 @@ static int can_move_mount_beneath(const struct path *from, return 0; } -static int do_move_mount(struct path *old_path, struct path *new_path, - bool beneath) +/* may_use_mount() - check if a mount tree can be used + * @mnt: vfsmount to be used + * + * This helper checks if the caller may use the mount tree starting + * from @path->mnt. The caller may use the mount tree under the + * following circumstances: + * + * (1) The caller is located in the mount namespace of the mount tree. + * This also implies that the mount does not belong to an anonymous + * mount namespace. + * (2) The caller is trying to use a mount tree that belongs to an + * anonymous mount namespace. + * + * For that to be safe, this helper enforces that the origin mount + * namespace the anonymous mount namespace was created from is the + * same as the caller's mount namespace by comparing the sequence + * numbers. + * + * The ownership of a non-anonymous mount namespace such as the + * caller's cannot change. + * => We know that the caller's mount namespace is stable. + * + * If the origin sequence number of the anonymous mount namespace is + * the same as the sequence number of the caller's mount namespace. + * => The owning namespaces are the same. + * + * ==> The earlier capability check on the owning namespace of the + * caller's mount namespace ensures that the caller has the + * ability to use the mount tree. + * + * Returns true if the mount tree can be used, false otherwise. + */ +static inline bool may_use_mount(struct mount *mnt) +{ + if (check_mnt(mnt)) + return true; + + /* + * Make sure that noone unmounted the target path or somehow + * managed to get their hands on something purely kernel + * internal. + */ + if (!is_mounted(&mnt->mnt)) + return false; + + return check_anonymous_mnt(mnt); +} + +static int do_move_mount(struct path *old_path, + struct path *new_path, enum mnt_tree_flags_t flags) { struct mnt_namespace *ns; struct mount *p; @@ -3329,8 +3512,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path, struct mount *parent; struct mountpoint *mp, *old_mp; int err; - bool attached; - enum mnt_tree_flags_t flags = 0; + bool attached, beneath = flags & MNT_TREE_BENEATH; mp = do_lock_mount(new_path, beneath); if (IS_ERR(mp)) @@ -3346,8 +3528,14 @@ static int do_move_mount(struct path *old_path, struct path *new_path, ns = old->mnt_ns; err = -EINVAL; - /* The mountpoint must be in our namespace. */ - if (!check_mnt(p)) + if (!may_use_mount(p)) + goto out; + + /* + * Don't allow moving an attached mount tree to an anonymous + * mount tree. + */ + if (!is_anon_ns(ns) && is_anon_ns(p->mnt_ns)) goto out; /* The thing moved must be mounted... */ @@ -3358,6 +3546,16 @@ static int do_move_mount(struct path *old_path, struct path *new_path, if (!(attached ? check_mnt(old) : is_anon_ns(ns))) goto out; + /* + * Ending up with two files referring to the root of the same + * anonymous mount namespace would cause an error as this would + * mean trying to move the same mount twice into the mount tree + * which would be rejected later. But be explicit about it right + * here. + */ + if (is_anon_ns(ns) && is_anon_ns(p->mnt_ns) && ns == p->mnt_ns) + goto out; + if (old->mnt.mnt_flags & MNT_LOCKED) goto out; @@ -3408,10 +3606,13 @@ static int do_move_mount(struct path *old_path, struct path *new_path, out: unlock_mount(mp); if (!err) { - if (attached) + if (attached) { mntput_no_expire(parent); - else + } else { + /* Make sure we notice when we leak mounts. */ + VFS_WARN_ON_ONCE(!mnt_ns_empty(ns)); free_mnt_ns(ns); + } } return err; } @@ -3428,7 +3629,7 @@ static int do_move_mount_old(struct path *path, const char *old_name) if (err) return err; - err = do_move_mount(&old_path, path, false); + err = do_move_mount(&old_path, path, 0); path_put(&old_path); return err; } @@ -4269,6 +4470,21 @@ err_unlock: return ret; } +static inline int vfs_move_mount(struct path *from_path, struct path *to_path, + enum mnt_tree_flags_t mflags) +{ + int ret; + + ret = security_move_mount(from_path, to_path); + if (ret) + return ret; + + if (mflags & MNT_TREE_PROPAGATION) + return do_set_group(from_path, to_path); + + return do_move_mount(from_path, to_path, mflags); +} + /* * Move a mount from one place to another. In combination with * fsopen()/fsmount() this is used to install a new mount and in combination @@ -4282,8 +4498,12 @@ SYSCALL_DEFINE5(move_mount, int, to_dfd, const char __user *, to_pathname, unsigned int, flags) { - struct path from_path, to_path; - unsigned int lflags; + struct path to_path __free(path_put) = {}; + struct path from_path __free(path_put) = {}; + struct filename *to_name __free(putname) = NULL; + struct filename *from_name __free(putname) = NULL; + unsigned int lflags, uflags; + enum mnt_tree_flags_t mflags = 0; int ret = 0; if (!may_mount()) @@ -4296,43 +4516,51 @@ SYSCALL_DEFINE5(move_mount, (MOVE_MOUNT_BENEATH | MOVE_MOUNT_SET_GROUP)) return -EINVAL; - /* If someone gives a pathname, they aren't permitted to move - * from an fd that requires unmount as we can't get at the flag - * to clear it afterwards. - */ + if (flags & MOVE_MOUNT_SET_GROUP) mflags |= MNT_TREE_PROPAGATION; + if (flags & MOVE_MOUNT_BENEATH) mflags |= MNT_TREE_BENEATH; + lflags = 0; if (flags & MOVE_MOUNT_F_SYMLINKS) lflags |= LOOKUP_FOLLOW; if (flags & MOVE_MOUNT_F_AUTOMOUNTS) lflags |= LOOKUP_AUTOMOUNT; - if (flags & MOVE_MOUNT_F_EMPTY_PATH) lflags |= LOOKUP_EMPTY; - - ret = user_path_at(from_dfd, from_pathname, lflags, &from_path); - if (ret < 0) - return ret; + if (flags & MOVE_MOUNT_F_EMPTY_PATH) uflags = AT_EMPTY_PATH; + from_name = getname_maybe_null(from_pathname, uflags); + if (IS_ERR(from_name)) + return PTR_ERR(from_name); lflags = 0; if (flags & MOVE_MOUNT_T_SYMLINKS) lflags |= LOOKUP_FOLLOW; if (flags & MOVE_MOUNT_T_AUTOMOUNTS) lflags |= LOOKUP_AUTOMOUNT; - if (flags & MOVE_MOUNT_T_EMPTY_PATH) lflags |= LOOKUP_EMPTY; + if (flags & MOVE_MOUNT_T_EMPTY_PATH) uflags = AT_EMPTY_PATH; + to_name = getname_maybe_null(to_pathname, uflags); + if (IS_ERR(to_name)) + return PTR_ERR(to_name); + + if (!to_name && to_dfd >= 0) { + CLASS(fd_raw, f_to)(to_dfd); + if (fd_empty(f_to)) + return -EBADF; + + to_path = fd_file(f_to)->f_path; + path_get(&to_path); + } else { + ret = filename_lookup(to_dfd, to_name, lflags, &to_path, NULL); + if (ret) + return ret; + } - ret = user_path_at(to_dfd, to_pathname, lflags, &to_path); - if (ret < 0) - goto out_from; + if (!from_name && from_dfd >= 0) { + CLASS(fd_raw, f_from)(from_dfd); + if (fd_empty(f_from)) + return -EBADF; - ret = security_move_mount(&from_path, &to_path); - if (ret < 0) - goto out_to; + return vfs_move_mount(&fd_file(f_from)->f_path, &to_path, mflags); + } - if (flags & MOVE_MOUNT_SET_GROUP) - ret = do_set_group(&from_path, &to_path); - else - ret = do_move_mount(&from_path, &to_path, - (flags & MOVE_MOUNT_BENEATH)); + ret = filename_lookup(from_dfd, from_name, lflags, &from_path, NULL); + if (ret) + return ret; -out_to: - path_put(&to_path); -out_from: - path_put(&from_path); - return ret; + return vfs_move_mount(&from_path, &to_path, mflags); } /* @@ -5300,7 +5528,7 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root) * We have to find the first mount in our ns and use that, however it * may not exist, so handle that properly. */ - if (RB_EMPTY_ROOT(&ns->mounts)) + if (mnt_ns_empty(ns)) return -ENOENT; first = child = ns->root; @@ -5325,7 +5553,7 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id, int err; /* Has the namespace already been emptied? */ - if (mnt_ns_id && RB_EMPTY_ROOT(&ns->mounts)) + if (mnt_ns_id && mnt_ns_empty(ns)) return -ENOENT; s->mnt = lookup_mnt_in_ns(mnt_id, ns); diff --git a/include/linux/fs.h b/include/linux/fs.h index e71d58c7f59c..7e9df867538d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2855,6 +2855,7 @@ static inline struct filename *getname_maybe_null(const char __user *name, int f return __getname_maybe_null(name); } extern void putname(struct filename *name); +DEFINE_FREE(putname, struct filename *, if (!IS_ERR_OR_NULL(_T)) putname(_T)) extern int finish_open(struct file *file, struct dentry *dentry, int (*open)(struct inode *, struct file *)); diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c index 70f65eb320a7..1e0508cb5c2d 100644 --- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c +++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c @@ -126,6 +126,26 @@ #endif #endif +#ifndef __NR_move_mount + #if defined __alpha__ + #define __NR_move_mount 539 + #elif defined _MIPS_SIM + #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ + #define __NR_move_mount 4429 + #endif + #if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */ + #define __NR_move_mount 6429 + #endif + #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */ + #define __NR_move_mount 5429 + #endif + #elif defined __ia64__ + #define __NR_move_mount (428 + 1024) + #else + #define __NR_move_mount 429 + #endif +#endif + #ifndef MOUNT_ATTR_IDMAP #define MOUNT_ATTR_IDMAP 0x00100000 #endif @@ -157,6 +177,51 @@ static inline int sys_open_tree(int dfd, const char *filename, unsigned int flag return syscall(__NR_open_tree, dfd, filename, flags); } +/* move_mount() flags */ +#ifndef MOVE_MOUNT_F_SYMLINKS +#define MOVE_MOUNT_F_SYMLINKS 0x00000001 /* Follow symlinks on from path */ +#endif + +#ifndef MOVE_MOUNT_F_AUTOMOUNTS +#define MOVE_MOUNT_F_AUTOMOUNTS 0x00000002 /* Follow automounts on from path */ +#endif + +#ifndef MOVE_MOUNT_F_EMPTY_PATH +#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */ +#endif + +#ifndef MOVE_MOUNT_T_SYMLINKS +#define MOVE_MOUNT_T_SYMLINKS 0x00000010 /* Follow symlinks on to path */ +#endif + +#ifndef MOVE_MOUNT_T_AUTOMOUNTS +#define MOVE_MOUNT_T_AUTOMOUNTS 0x00000020 /* Follow automounts on to path */ +#endif + +#ifndef MOVE_MOUNT_T_EMPTY_PATH +#define MOVE_MOUNT_T_EMPTY_PATH 0x00000040 /* Empty to path permitted */ +#endif + +#ifndef MOVE_MOUNT_SET_GROUP +#define MOVE_MOUNT_SET_GROUP 0x00000100 /* Set sharing group instead */ +#endif + +#ifndef MOVE_MOUNT_BENEATH +#define MOVE_MOUNT_BENEATH 0x00000200 /* Mount beneath top mount */ +#endif + +#ifndef MOVE_MOUNT__MASK +#define MOVE_MOUNT__MASK 0x00000377 +#endif + +static inline int sys_move_mount(int from_dfd, const char *from_pathname, + int to_dfd, const char *to_pathname, + unsigned int flags) +{ + return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, + to_pathname, flags); +} + static ssize_t write_nointr(int fd, const void *buf, size_t count) { ssize_t ret; @@ -397,6 +462,10 @@ FIXTURE_SETUP(mount_setattr) ASSERT_EQ(mkdir("/tmp/B/BB", 0777), 0); + ASSERT_EQ(mkdir("/tmp/target1", 0777), 0); + + ASSERT_EQ(mkdir("/tmp/target2", 0777), 0); + ASSERT_EQ(mount("testing", "/tmp/B/BB", "tmpfs", MS_NOATIME | MS_NODEV, "size=100000,mode=700"), 0); @@ -1506,4 +1575,526 @@ TEST_F(mount_setattr, mount_attr_nosymfollow) ASSERT_EQ(close(fd), 0); } +TEST_F(mount_setattr, open_tree_detached) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + * |-/mnt/A testing tmpfs + * | `-/mnt/A/AA testing tmpfs + * | `-/mnt/A/AA/B testing tmpfs + * | `-/mnt/A/AA/B/BB testing tmpfs + * `-/mnt/B testing ramfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + fd_tree_subdir = sys_open_tree(fd_tree_base, "A/AA", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_subdir, 0); + /* + * /AA testing tmpfs + * `-/AA/B testing tmpfs + * `-/AA/B/BB testing tmpfs + */ + ASSERT_EQ(statx(fd_tree_subdir, "B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_subdir, "B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + ASSERT_EQ(move_mount(fd_tree_subdir, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + /* + * /tmp/target1 testing tmpfs + * `-/tmp/target1/B testing tmpfs + * `-/tmp/target1/B/BB testing tmpfs + */ + ASSERT_EQ(statx(-EBADF, "/tmp/target1", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target1/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target1/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + ASSERT_EQ(move_mount(fd_tree_base, "", -EBADF, "/tmp/target2", MOVE_MOUNT_F_EMPTY_PATH), 0); + /* + * /tmp/target2 testing tmpfs + * |-/tmp/target2/A testing tmpfs + * | `-/tmp/target2/A/AA testing tmpfs + * | `-/tmp/target2/A/AA/B testing tmpfs + * | `-/tmp/target2/A/AA/B/BB testing tmpfs + * `-/tmp/target2/B testing ramfs + */ + ASSERT_EQ(statx(-EBADF, "/tmp/target2", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target2/A", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target2/A/AA", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target2/A/AA/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target2/A/AA/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(-EBADF, "/tmp/target2/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + EXPECT_EQ(close(fd_tree_base), 0); + EXPECT_EQ(close(fd_tree_subdir), 0); +} + +TEST_F(mount_setattr, open_tree_detached_fail) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + * |-/mnt/A testing tmpfs + * | `-/mnt/A/AA testing tmpfs + * | `-/mnt/A/AA/B testing tmpfs + * | `-/mnt/A/AA/B/BB testing tmpfs + * `-/mnt/B testing ramfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + ASSERT_EQ(unshare(CLONE_NEWNS), 0); + + /* + * The origin mount namespace of the anonymous mount namespace + * of @fd_tree_base doesn't match the caller's mount namespace + * anymore so creation of another detached mounts must fail. + */ + fd_tree_subdir = sys_open_tree(fd_tree_base, "A/AA", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_LT(fd_tree_subdir, 0); + ASSERT_EQ(errno, EINVAL); +} + +TEST_F(mount_setattr, open_tree_detached_fail2) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + * |-/mnt/A testing tmpfs + * | `-/mnt/A/AA testing tmpfs + * | `-/mnt/A/AA/B testing tmpfs + * | `-/mnt/A/AA/B/BB testing tmpfs + * `-/mnt/B testing ramfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + EXPECT_EQ(create_and_enter_userns(), 0); + + /* + * The caller entered a new user namespace. They will have + * CAP_SYS_ADMIN in this user namespace. However, they're still + * located in a mount namespace that is owned by an ancestor + * user namespace in which they hold no privilege. Creating a + * detached mount must thus fail. + */ + fd_tree_subdir = sys_open_tree(fd_tree_base, "A/AA", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_LT(fd_tree_subdir, 0); + ASSERT_EQ(errno, EPERM); +} + +TEST_F(mount_setattr, open_tree_detached_fail3) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + * |-/mnt/A testing tmpfs + * | `-/mnt/A/AA testing tmpfs + * | `-/mnt/A/AA/B testing tmpfs + * | `-/mnt/A/AA/B/BB testing tmpfs + * `-/mnt/B testing ramfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_EQ(statx(fd_tree_base, "A/AA/B/BB", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + EXPECT_EQ(prepare_unpriv_mountns(), 0); + + /* + * The caller entered a new mount namespace. They will have + * CAP_SYS_ADMIN in the owning user namespace of their mount + * namespace. + * + * However, the origin mount namespace of the anonymous mount + * namespace of @fd_tree_base doesn't match the caller's mount + * namespace anymore so creation of another detached mounts must + * fail. + */ + fd_tree_subdir = sys_open_tree(fd_tree_base, "A/AA", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_LT(fd_tree_subdir, 0); + ASSERT_EQ(errno, EINVAL); +} + +TEST_F(mount_setattr, mount_detached_mount_on_detached_mount_then_close) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + fd_tree_subdir = sys_open_tree(fd_tree_base, "", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_EMPTY_PATH | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_subdir, 0); + /* + * /mnt testing tmpfs + */ + ASSERT_EQ(statx(fd_tree_subdir, "A", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + /* + * /mnt testing tmpfs + * `-/mnt testing tmpfs + */ + ASSERT_EQ(move_mount(fd_tree_subdir, "", fd_tree_base, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + ASSERT_NE(move_mount(fd_tree_subdir, "", fd_tree_base, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + + EXPECT_EQ(close(fd_tree_base), 0); + EXPECT_EQ(close(fd_tree_subdir), 0); +} + +TEST_F(mount_setattr, mount_detached_mount_on_detached_mount_and_attach) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + __u64 mnt_id = 0; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + /* + * /mnt testing tmpfs + */ + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + fd_tree_subdir = sys_open_tree(fd_tree_base, "", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_EMPTY_PATH | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_subdir, 0); + /* + * /mnt testing tmpfs + */ + ASSERT_EQ(statx(fd_tree_subdir, "A", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + /* + * /mnt testing tmpfs + * `-/mnt testing tmpfs + */ + ASSERT_EQ(move_mount(fd_tree_subdir, "", fd_tree_base, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_TRUE(stx.stx_mask & STATX_MNT_ID_UNIQUE); + mnt_id = stx.stx_mnt_id; + + ASSERT_NE(move_mount(fd_tree_subdir, "", fd_tree_base, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + + ASSERT_EQ(move_mount(fd_tree_base, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + ASSERT_EQ(statx(-EBADF, "/tmp/target1", 0, STATX_MNT_ID_UNIQUE, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + ASSERT_TRUE(stx.stx_mask & STATX_MNT_ID_UNIQUE); + ASSERT_EQ(stx.stx_mnt_id, mnt_id); + + EXPECT_EQ(close(fd_tree_base), 0); + EXPECT_EQ(close(fd_tree_subdir), 0); +} + +TEST_F(mount_setattr, move_mount_detached_fail) +{ + int fd_tree_base = -EBADF, fd_tree_subdir = -EBADF; + struct statx stx; + + fd_tree_base = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_base, 0); + + /* Attach the mount to the caller's mount namespace. */ + ASSERT_EQ(move_mount(fd_tree_base, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + + ASSERT_EQ(statx(fd_tree_base, "A", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + fd_tree_subdir = sys_open_tree(-EBADF, "/tmp/B", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE); + ASSERT_GE(fd_tree_subdir, 0); + ASSERT_EQ(statx(fd_tree_subdir, "BB", 0, 0, &stx), 0); + ASSERT_FALSE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + /* Not allowed to move an attached mount to a detached mount. */ + ASSERT_NE(move_mount(fd_tree_base, "", fd_tree_subdir, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + ASSERT_EQ(errno, EINVAL); + + EXPECT_EQ(close(fd_tree_base), 0); + EXPECT_EQ(close(fd_tree_subdir), 0); +} + +TEST_F(mount_setattr, attach_detached_mount_then_umount_then_close) +{ + int fd_tree = -EBADF; + struct statx stx; + + fd_tree = sys_open_tree(-EBADF, "/mnt", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree, 0); + + ASSERT_EQ(statx(fd_tree, "A", 0, 0, &stx), 0); + /* We copied with AT_RECURSIVE so /mnt/A must be a mountpoint. */ + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + /* Attach the mount to the caller's mount namespace. */ + ASSERT_EQ(move_mount(fd_tree, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + + ASSERT_EQ(statx(-EBADF, "/tmp/target1", 0, 0, &stx), 0); + ASSERT_TRUE(stx.stx_attributes & STATX_ATTR_MOUNT_ROOT); + + ASSERT_EQ(umount2("/tmp/target1", MNT_DETACH), 0); + + /* + * This tests whether dissolve_on_fput() handles a NULL mount + * namespace correctly, i.e., that it doesn't splat. + */ + EXPECT_EQ(close(fd_tree), 0); +} + +TEST_F(mount_setattr, mount_detached1_onto_detached2_then_close_detached1_then_mount_detached2_onto_attached) +{ + int fd_tree1 = -EBADF, fd_tree2 = -EBADF; + + /* + * |-/mnt/A testing tmpfs + * `-/mnt/A/AA testing tmpfs + * `-/mnt/A/AA/B testing tmpfs + * `-/mnt/A/AA/B/BB testing tmpfs + */ + fd_tree1 = sys_open_tree(-EBADF, "/mnt/A", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree1, 0); + + /* + * `-/mnt/B testing ramfs + */ + fd_tree2 = sys_open_tree(-EBADF, "/mnt/B", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_EMPTY_PATH | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree2, 0); + + /* + * Move the source detached mount tree to the target detached + * mount tree. This will move all the mounts in the source mount + * tree from the source anonymous mount namespace to the target + * anonymous mount namespace. + * + * The source detached mount tree and the target detached mount + * tree now both refer to the same anonymous mount namespace. + * + * |-"" testing ramfs + * `-"" testing tmpfs + * `-""/AA testing tmpfs + * `-""/AA/B testing tmpfs + * `-""/AA/B/BB testing tmpfs + */ + ASSERT_EQ(move_mount(fd_tree1, "", fd_tree2, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); + + /* + * The source detached mount tree @fd_tree1 is now an attached + * mount, i.e., it has a parent. Specifically, it now has the + * root mount of the mount tree of @fd_tree2 as its parent. + * + * That means we are no longer allowed to attach it as we only + * allow attaching the root of an anonymous mount tree, not + * random bits and pieces. Verify that the kernel enforces this. + */ + ASSERT_NE(move_mount(fd_tree1, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + + /* + * Closing the source detached mount tree must not unmount and + * free the shared anonymous mount namespace. The kernel will + * quickly yell at us because the anonymous mount namespace + * won't be empty when it's freed. + */ + EXPECT_EQ(close(fd_tree1), 0); + + /* + * Attach the mount tree to a non-anonymous mount namespace. + * This can only succeed if closing fd_tree1 had proper + * semantics and didn't cause the anonymous mount namespace to + * be freed. If it did this will trigger a UAF which will be + * visible on any KASAN enabled kernel. + * + * |-/tmp/target1 testing ramfs + * `-/tmp/target1 testing tmpfs + * `-/tmp/target1/AA testing tmpfs + * `-/tmp/target1/AA/B testing tmpfs + * `-/tmp/target1/AA/B/BB testing tmpfs + */ + ASSERT_EQ(move_mount(fd_tree2, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + EXPECT_EQ(close(fd_tree2), 0); +} + +TEST_F(mount_setattr, two_detached_mounts_referring_to_same_anonymous_mount_namespace) +{ + int fd_tree1 = -EBADF, fd_tree2 = -EBADF; + + /* + * Copy the following mount tree: + * + * |-/mnt/A testing tmpfs + * `-/mnt/A/AA testing tmpfs + * `-/mnt/A/AA/B testing tmpfs + * `-/mnt/A/AA/B/BB testing tmpfs + */ + fd_tree1 = sys_open_tree(-EBADF, "/mnt/A", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree1, 0); + + /* + * Create an O_PATH file descriptors with a separate struct file + * that refers to the same detached mount tree as @fd_tree1 + */ + fd_tree2 = sys_open_tree(fd_tree1, "", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_EMPTY_PATH | OPEN_TREE_CLOEXEC); + ASSERT_GE(fd_tree2, 0); + + /* + * Copy the following mount tree: + * + * |-/tmp/target1 testing tmpfs + * `-/tmp/target1/AA testing tmpfs + * `-/tmp/target1/AA/B testing tmpfs + * `-/tmp/target1/AA/B/BB testing tmpfs + */ + ASSERT_EQ(move_mount(fd_tree2, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + + /* + * This must fail as this would mean adding the same mount tree + * into the same mount tree. + */ + ASSERT_NE(move_mount(fd_tree1, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); +} + +TEST_F(mount_setattr, two_detached_subtrees_of_same_anonymous_mount_namespace) +{ + int fd_tree1 = -EBADF, fd_tree2 = -EBADF; + + /* + * Copy the following mount tree: + * + * |-/mnt/A testing tmpfs + * `-/mnt/A/AA testing tmpfs + * `-/mnt/A/AA/B testing tmpfs + * `-/mnt/A/AA/B/BB testing tmpfs + */ + fd_tree1 = sys_open_tree(-EBADF, "/mnt/A", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_RECURSIVE | OPEN_TREE_CLOEXEC | + OPEN_TREE_CLONE); + ASSERT_GE(fd_tree1, 0); + + /* + * Create an O_PATH file descriptors with a separate struct file that + * refers to a subtree of the same detached mount tree as @fd_tree1 + */ + fd_tree2 = sys_open_tree(fd_tree1, "AA", + AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW | + AT_EMPTY_PATH | OPEN_TREE_CLOEXEC); + ASSERT_GE(fd_tree2, 0); + + /* + * This must fail as it is only possible to attach the root of a + * detached mount tree. + */ + ASSERT_NE(move_mount(fd_tree2, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); + + ASSERT_EQ(move_mount(fd_tree1, "", -EBADF, "/tmp/target1", MOVE_MOUNT_F_EMPTY_PATH), 0); +} + TEST_HARNESS_MAIN |