1 files changed, 143 insertions, 63 deletions
diff --git a/Documentation/filesystems/idmappings.rst b/Documentation/filesystems/idmappings.rst
index c1db8748389c..ac0af679e61e 100644
--- a/Documentation/filesystems/idmappings.rst
+++ b/Documentation/filesystems/idmappings.rst
@@ -36,7 +36,7 @@ and write down the mappings it will generate::
 From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
 idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are
 order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
-the set of all possible ids useable on a given system.
+the set of all possible ids usable on a given system.
 
 Looking at this mathematically briefly will help us highlight some properties
 that make it easier to understand how we can translate between idmappings. For
@@ -47,7 +47,7 @@ example, we know that the inverse idmapping is an order isomorphism as well::
  k10002 -> u24
 
 Given that we are dealing with order isomorphisms plus the fact that we're
-dealing with subsets we can embedd idmappings into each other, i.e. we can
+dealing with subsets we can embed idmappings into each other, i.e. we can
 sensibly translate between different idmappings. For example, assume we've been
 given the three idmappings::
 
@@ -60,7 +60,7 @@ and id ``k11000`` which has been generated by the first idmapping by mapping
 
 Because we're dealing with order isomorphic subsets it is meaningful to ask
 what id ``k11000`` corresponds to in the second or third idmapping. The
-straightfoward algorithm to use is to apply the inverse of the first idmapping,
+straightforward algorithm to use is to apply the inverse of the first idmapping,
 mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using
 either the second idmapping mapping or third idmapping mapping. The second
 idmapping would map ``u1000`` down to ``21000``. The third idmapping would map
@@ -146,9 +146,10 @@ For the rest of this document we will prefix all userspace ids with ``u`` and
 all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
 an idmapping will be written as ``u0:k10000:r10000``.
 
-For example, the id ``u1000`` is an id in the upper idmapset or "userspace
-idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
-kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
+For example, within this idmapping, the id ``u1000`` is an id in the upper
+idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to
+``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
+starting with ``k10000``.
 
 A kernel id is always created by an idmapping. Such idmappings are associated
 with user namespaces. Since we mainly care about how idmappings work we're not
@@ -241,7 +242,7 @@ according to the filesystem's idmapping as this would give the wrong owner if
 the caller is using an idmapping.
 
 So the kernel will map the id back up in the idmapping of the caller. Let's
-assume the caller has the slighly unconventional idmapping
+assume the caller has the somewhat unconventional idmapping
 ``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``.
 Consequently the user would see that this file is owned by ``u4000``.
 
@@ -320,6 +321,10 @@ and equally wrong::
  from_kuid(u20000:k0:r10000, u1000) = k21000
                              ~~~~~
 
+Since userspace ids have type ``uid_t`` and ``gid_t`` and kernel ids have type
+``kuid_t`` and ``kgid_t`` the compiler will throw an error when they are
+conflated. So the two examples above would cause a compilation failure.
+
 Idmappings when creating filesystem objects
 -------------------------------------------
 
@@ -363,12 +368,19 @@ So with the second step the kernel guarantees that a valid userspace id can be
 written to disk. If it can't the kernel will refuse the creation request to not
 even remotely risk filesystem corruption.
 
-The astute reader will have realized that this is simply a varation of the
+The astute reader will have realized that this is simply a variation of the
 crossmapping algorithm we mentioned above in a previous section. First, the
 kernel maps the caller's userspace id down into a kernel id according to the
 caller's idmapping and then maps that kernel id up according to the
 filesystem's idmapping.
 
+From the implementation point it's worth mentioning how idmappings are represented.
+All idmappings are taken from the corresponding user namespace.
+
+    - caller's idmapping (usually taken from ``current_user_ns()``)
+    - filesystem's idmapping (``sb->s_user_ns``)
+    - mount's idmapping (``mnt_idmap(vfsmnt)``)
+
 Let's see some examples with caller/filesystem idmapping but without mount
 idmappings. This will exhibit some problems we can hit. After that we will
 revisit/reconsider these examples, this time using mount idmappings, to see how
@@ -454,7 +466,7 @@ the kernel id that was created in the caller's idmapping. This has mainly two
 consequences.
 
 First, that we can't allow a caller to ultimately write to disk with another
-userspace id. We could only do this if we were to mount the whole fileystem
+userspace id. We could only do this if we were to mount the whole filesystem
 with the caller's or another idmapping. But that solution is limited to a few
 filesystems and not very flexible. But this is a use-case that is pretty
 important in containerized workloads.
@@ -585,7 +597,7 @@ on their work machine.
 
 In both cases changing ownership recursively has grave implications. The most
 obvious one is that ownership is changed globally and permanently. In the home
-directory case this change in ownership would even need to happen everytime the
+directory case this change in ownership would even need to happen every time the
 user switches from their home to their work machine. For really large sets of
 files this becomes increasingly costly.
 
@@ -623,45 +635,108 @@ privileged users in the initial user namespace.
 However, it is perfectly possible to combine idmapped mounts with filesystems
 mountable inside user namespaces. We will touch on this further below.
 
+Filesystem types vs idmapped mount types
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With the introduction of idmapped mounts we need to distinguish between
+filesystem ownership and mount ownership of a VFS object such as an inode. The
+owner of a inode might be different when looked at from a filesystem
+perspective than when looked at from an idmapped mount. Such fundamental
+conceptual distinctions should almost always be clearly expressed in the code.
+So, to distinguish idmapped mount ownership from filesystem ownership separate
+types have been introduced.
+
+If a uid or gid has been generated using the filesystem or caller's idmapping
+then we will use the ``kuid_t`` and ``kgid_t`` types. However, if a uid or gid
+has been generated using a mount idmapping then we will be using the dedicated
+``vfsuid_t`` and ``vfsgid_t`` types.
+
+All VFS helpers that generate or take uids and gids as arguments use the
+``vfsuid_t`` and ``vfsgid_t`` types and we will be able to rely on the compiler
+to catch errors that originate from conflating filesystem and VFS uids and gids.
+
+The ``vfsuid_t`` and ``vfsgid_t`` types are often mapped from and to ``kuid_t``
+and ``kgid_t`` types similar how ``kuid_t`` and ``kgid_t`` types are mapped
+from and to ``uid_t`` and ``gid_t`` types::
+
+ uid_t <--> kuid_t <--> vfsuid_t
+ gid_t <--> kgid_t <--> vfsgid_t
+
+Whenever we report ownership based on a ``vfsuid_t`` or ``vfsgid_t`` type,
+e.g., during ``stat()``, or store ownership information in a shared VFS object
+based on a ``vfsuid_t`` or ``vfsgid_t`` type, e.g., during ``chown()`` we can
+use the ``vfsuid_into_kuid()`` and ``vfsgid_into_kgid()`` helpers.
+
+To illustrate why this helper currently exists, consider what happens when we
+change ownership of an inode from an idmapped mount. After we generated
+a ``vfsuid_t`` or ``vfsgid_t`` based on the mount idmapping we later commit to
+this ``vfsuid_t`` or ``vfsgid_t`` to become the new filesystem wide ownership.
+Thus, we are turning the ``vfsuid_t`` or ``vfsgid_t`` into a global ``kuid_t``
+or ``kgid_t``. And this can be done by using ``vfsuid_into_kuid()`` and
+``vfsgid_into_kgid()``.
+
+Note, whenever a shared VFS object, e.g., a cached ``struct inode`` or a cached
+``struct posix_acl``, stores ownership information a filesystem or "global"
+``kuid_t`` and ``kgid_t`` must be used. Ownership expressed via ``vfsuid_t``
+and ``vfsgid_t`` is specific to an idmapped mount.
+
+We already noted that ``vfsuid_t`` and ``vfsgid_t`` types are generated based
+on mount idmappings whereas ``kuid_t`` and ``kgid_t`` types are generated based
+on filesystem idmappings. To prevent abusing filesystem idmappings to generate
+``vfsuid_t`` or ``vfsgid_t`` types or mount idmappings to generate ``kuid_t``
+or ``kgid_t`` types filesystem idmappings and mount idmappings are different
+types as well.
+
+All helpers that map to or from ``vfsuid_t`` and ``vfsgid_t`` types require
+a mount idmapping to be passed which is of type ``struct mnt_idmap``. Passing
+a filesystem or caller idmapping will cause a compilation error.
+
+Similar to how we prefix all userspace ids in this document with ``u`` and all
+kernel ids with ``k`` we will prefix all VFS ids with ``v``. So a mount
+idmapping will be written as: ``u0:v10000:r10000``.
+
 Remapping helpers
 ~~~~~~~~~~~~~~~~~
 
 Idmapping functions were added that translate between idmappings. They make use
-of the remapping algorithm we've introduced earlier. We're going to look at
-two:
+of the remapping algorithm we've introduced earlier. We're going to look at:
 
-- ``i_uid_into_mnt()`` and ``i_gid_into_mnt()``
+- ``i_uid_into_vfsuid()`` and ``i_gid_into_vfsgid()``
 
-  The ``i_*id_into_mnt()`` functions translate filesystem's kernel ids into
-  kernel ids in the mount's idmapping::
+  The ``i_*id_into_vfs*id()`` functions translate filesystem's kernel ids into
+  VFS ids in the mount's idmapping::
 
    /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
    from_kuid(filesystem, kid) = uid
 
-   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
+   /* Map the filesystem's userspace id down ito a VFS id in the mount's idmapping. */
    make_kuid(mount, uid) = kuid
 
 - ``mapped_fsuid()`` and ``mapped_fsgid()``
 
   The ``mapped_fs*id()`` functions translate the caller's kernel ids into
   kernel ids in the filesystem's idmapping. This translation is achieved by
-  remapping the caller's kernel ids using the mount's idmapping::
+  remapping the caller's VFS ids using the mount's idmapping::
 
-   /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
+   /* Map the caller's VFS id up into a userspace id in the mount's idmapping. */
    from_kuid(mount, kid) = uid
 
    /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
    make_kuid(filesystem, uid) = kuid
 
+- ``vfsuid_into_kuid()`` and ``vfsgid_into_kgid()``
+
+   Whenever
+
 Note that these two functions invert each other. Consider the following
 idmappings::
 
  caller idmapping:     u0:k10000:r10000
  filesystem idmapping: u0:k20000:r10000
- mount idmapping:      u0:k10000:r10000
+ mount idmapping:      u0:v10000:r10000
 
 Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
-to ``k21000`` according to it's idmapping. This is what is stored in the
+to ``k21000`` according to its idmapping. This is what is stored in the
 inode's ``i_uid`` and ``i_gid`` fields.
 
 When the caller queries the ownership of this file via ``stat()`` the kernel
@@ -669,20 +744,21 @@ would usually simply use the crossmapping algorithm and map the filesystem's
 kernel id up to a userspace id in the caller's idmapping.
 
 But when the caller is accessing the file on an idmapped mount the kernel will
-first call ``i_uid_into_mnt()`` thereby translating the filesystem's kernel id
-into a kernel id in the mount's idmapping::
+first call ``i_uid_into_vfsuid()`` thereby translating the filesystem's kernel
+id into a VFS id in the mount's idmapping::
 
- i_uid_into_mnt(k21000):
+ i_uid_into_vfsuid(k21000):
    /* Map the filesystem's kernel id up into a userspace id. */
    from_kuid(u0:k20000:r10000, k21000) = u1000
 
-   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
-   make_kuid(u0:k10000:r10000, u1000) = k11000
+   /* Map the filesystem's userspace id down into a VFS id in the mount's idmapping. */
+   make_kuid(u0:v10000:r10000, u1000) = v11000
 
 Finally, when the kernel reports the owner to the caller it will turn the
-kernel id in the mount's idmapping into a userspace id in the caller's
+VFS id in the mount's idmapping into a userspace id in the caller's
 idmapping::
 
+  k11000 = vfsuid_into_kuid(v11000)
   from_kuid(u0:k10000:r10000, k11000) = u1000
 
 We can test whether this algorithm really works by verifying what happens when
@@ -696,18 +772,19 @@ fails.
 
 But when the caller is accessing the file on an idmapped mount the kernel will
 first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
-a kernel id according to the mount's idmapping::
+a VFS id according to the mount's idmapping::
 
  mapped_fsuid(k11000):
     /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
     from_kuid(u0:k10000:r10000, k11000) = u1000
 
     /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
-    make_kuid(u0:k20000:r10000, u1000) = k21000
+    make_kuid(u0:v20000:r10000, u1000) = v21000
 
-When finally writing to disk the kernel will then map ``k21000`` up into a
+When finally writing to disk the kernel will then map ``v21000`` up into a
 userspace id in the filesystem's idmapping::
 
+   k21000 = vfsuid_into_kuid(v21000)
    from_kuid(u0:k20000:r10000, k21000) = u1000
 
 As we can see, we end up with an invertible and therefore information
@@ -725,7 +802,7 @@ Example 2 reconsidered
  caller id:            u1000
  caller idmapping:     u0:k10000:r10000
  filesystem idmapping: u0:k20000:r10000
- mount idmapping:      u0:k10000:r10000
+ mount idmapping:      u0:v10000:r10000
 
 When the caller is using a non-initial idmapping the common case is to attach
 the same idmapping to the mount. We now perform three steps:
@@ -734,12 +811,12 @@ the same idmapping to the mount. We now perform three steps:
 
     make_kuid(u0:k10000:r10000, u1000) = k11000
 
-2. Translate the caller's kernel id into a kernel id in the filesystem's
+2. Translate the caller's VFS id into a kernel id in the filesystem's
    idmapping::
 
-    mapped_fsuid(k11000):
-      /* Map the kernel id up into a userspace id in the mount's idmapping. */
-      from_kuid(u0:k10000:r10000, k11000) = u1000
+    mapped_fsuid(v11000):
+      /* Map the VFS id up into a userspace id in the mount's idmapping. */
+      from_kuid(u0:v10000:r10000, v11000) = u1000
 
       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
       make_kuid(u0:k20000:r10000, u1000) = k21000
@@ -759,7 +836,7 @@ Example 3 reconsidered
  caller id:            u1000
  caller idmapping:     u0:k10000:r10000
  filesystem idmapping: u0:k0:r4294967295
- mount idmapping:      u0:k10000:r10000
+ mount idmapping:      u0:v10000:r10000
 
 The same translation algorithm works with the third example.
 
@@ -767,12 +844,12 @@ The same translation algorithm works with the third example.
 
     make_kuid(u0:k10000:r10000, u1000) = k11000
 
-2. Translate the caller's kernel id into a kernel id in the filesystem's
+2. Translate the caller's VFS id into a kernel id in the filesystem's
    idmapping::
 
-    mapped_fsuid(k11000):
-       /* Map the kernel id up into a userspace id in the mount's idmapping. */
-       from_kuid(u0:k10000:r10000, k11000) = u1000
+    mapped_fsuid(v11000):
+       /* Map the VFS id up into a userspace id in the mount's idmapping. */
+       from_kuid(u0:v10000:r10000, v11000) = u1000
 
        /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
        make_kuid(u0:k0:r4294967295, u1000) = k1000
@@ -792,7 +869,7 @@ Example 4 reconsidered
  file id:              u1000
  caller idmapping:     u0:k10000:r10000
  filesystem idmapping: u0:k0:r4294967295
- mount idmapping:      u0:k10000:r10000
+ mount idmapping:      u0:v10000:r10000
 
 In order to report ownership to userspace the kernel now does three steps using
 the translation algorithm we introduced earlier:
@@ -802,17 +879,18 @@ the translation algorithm we introduced earlier:
 
     make_kuid(u0:k0:r4294967295, u1000) = k1000
 
-2. Translate the kernel id into a kernel id in the mount's idmapping::
+2. Translate the kernel id into a VFS id in the mount's idmapping::
 
-    i_uid_into_mnt(k1000):
+    i_uid_into_vfsuid(k1000):
       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
       from_kuid(u0:k0:r4294967295, k1000) = u1000
 
-      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
-      make_kuid(u0:k10000:r10000, u1000) = k11000
+      /* Map the userspace id down into a VFS id in the mounts's idmapping. */
+      make_kuid(u0:v10000:r10000, u1000) = v11000
 
-3. Map the kernel id up into a userspace id in the caller's idmapping::
+3. Map the VFS id up into a userspace id in the caller's idmapping::
 
+    k11000 = vfsuid_into_kuid(v11000)
     from_kuid(u0:k10000:r10000, k11000) = u1000
 
 Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
@@ -828,7 +906,7 @@ Example 5 reconsidered
  file id:              u1000
  caller idmapping:     u0:k10000:r10000
  filesystem idmapping: u0:k20000:r10000
- mount idmapping:      u0:k10000:r10000
+ mount idmapping:      u0:v10000:r10000
 
 Again, in order to report ownership to userspace the kernel now does three
 steps using the translation algorithm we introduced earlier:
@@ -838,17 +916,18 @@ steps using the translation algorithm we introduced earlier:
 
     make_kuid(u0:k20000:r10000, u1000) = k21000
 
-2. Translate the kernel id into a kernel id in the mount's idmapping::
+2. Translate the kernel id into a VFS id in the mount's idmapping::
 
-    i_uid_into_mnt(k21000):
+    i_uid_into_vfsuid(k21000):
       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
       from_kuid(u0:k20000:r10000, k21000) = u1000
 
-      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
-      make_kuid(u0:k10000:r10000, u1000) = k11000
+      /* Map the userspace id down into a VFS id in the mounts's idmapping. */
+      make_kuid(u0:v10000:r10000, u1000) = v11000
 
-3. Map the kernel id up into a userspace id in the caller's idmapping::
+3. Map the VFS id up into a userspace id in the caller's idmapping::
 
+    k11000 = vfsuid_into_kuid(v11000)
     from_kuid(u0:k10000:r10000, k11000) = u1000
 
 Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
@@ -899,23 +978,23 @@ from above:::
  caller id:            u1125
  caller idmapping:     u0:k0:r4294967295
  filesystem idmapping: u0:k0:r4294967295
- mount idmapping:      u1000:k1125:r1
+ mount idmapping:      u1000:v1125:r1
 
 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
 
     make_kuid(u0:k0:r4294967295, u1125) = k1125
 
-2. Translate the caller's kernel id into a kernel id in the filesystem's
+2. Translate the caller's VFS id into a kernel id in the filesystem's
    idmapping::
 
-    mapped_fsuid(k1125):
-      /* Map the kernel id up into a userspace id in the mount's idmapping. */
-      from_kuid(u1000:k1125:r1, k1125) = u1000
+    mapped_fsuid(v1125):
+      /* Map the VFS id up into a userspace id in the mount's idmapping. */
+      from_kuid(u1000:v1125:r1, v1125) = u1000
 
       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
       make_kuid(u0:k0:r4294967295, u1000) = k1000
 
-2. Verify that the caller's kernel ids can be mapped to userspace ids in the
+2. Verify that the caller's filesystem ids can be mapped to userspace ids in the
    filesystem's idmapping::
 
     from_kuid(u0:k0:r4294967295, k1000) = u1000
@@ -930,24 +1009,25 @@ on their work computer:
  file id:              u1000
  caller idmapping:     u0:k0:r4294967295
  filesystem idmapping: u0:k0:r4294967295
- mount idmapping:      u1000:k1125:r1
+ mount idmapping:      u1000:v1125:r1
 
 1. Map the userspace id on disk down into a kernel id in the filesystem's
    idmapping::
 
     make_kuid(u0:k0:r4294967295, u1000) = k1000
 
-2. Translate the kernel id into a kernel id in the mount's idmapping::
+2. Translate the kernel id into a VFS id in the mount's idmapping::
 
-    i_uid_into_mnt(k1000):
+    i_uid_into_vfsuid(k1000):
       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
       from_kuid(u0:k0:r4294967295, k1000) = u1000
 
-      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
-      make_kuid(u1000:k1125:r1, u1000) = k1125
+      /* Map the userspace id down into a VFS id in the mounts's idmapping. */
+      make_kuid(u1000:v1125:r1, u1000) = v1125
 
-3. Map the kernel id up into a userspace id in the caller's idmapping::
+3. Map the VFS id up into a userspace id in the caller's idmapping::
 
+    k1125 = vfsuid_into_kuid(v1125)
     from_kuid(u0:k0:r4294967295, k1125) = u1125
 
 So ultimately the caller will be reported that the file belongs to ``u1125``