12 files changed, 297 insertions, 57 deletions
diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst
index 610f718ef8b5..55f807293924 100644
--- a/Documentation/filesystems/debugfs.rst
+++ b/Documentation/filesystems/debugfs.rst
@@ -229,22 +229,15 @@ module is unloaded without explicitly removing debugfs entries, the result
 will be a lot of stale pointers and no end of highly antisocial behavior.
 So all debugfs users - at least those which can be built as modules - must
 be prepared to remove all files and directories they create there.  A file
-can be removed with::
+or directory can be removed with::
 
     void debugfs_remove(struct dentry *dentry);
 
 The dentry value can be NULL or an error value, in which case nothing will
-be removed.
-
-Once upon a time, debugfs users were required to remember the dentry
-pointer for every debugfs file they created so that all files could be
-cleaned up.  We live in more civilized times now, though, and debugfs users
-can call::
-
-    void debugfs_remove_recursive(struct dentry *dentry);
-
-If this function is passed a pointer for the dentry corresponding to the
-top-level directory, the entire hierarchy below that directory will be
-removed.
+be removed.  Note that this function will recursively remove all files and
+directories underneath it.  Previously, debugfs_remove_recursive() was used
+to perform that task, but this function is now just an alias to
+debugfs_remove().  debugfs_remove_recursive() should be considered
+deprecated.
 
 .. [1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index e15c4275862a..440e4ae74e44 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -182,32 +182,34 @@ fault_type=%d		 Support configuring fault injection type, should be
 			 enabled with fault_injection option, fault type value
 			 is shown below, it supports single or combined type.
 
-			 ===========================      ===========
+			 ===========================      ==========
 			 Type_Name                        Type_Value
-			 ===========================      ===========
-			 FAULT_KMALLOC                    0x000000001
-			 FAULT_KVMALLOC                   0x000000002
-			 FAULT_PAGE_ALLOC                 0x000000004
-			 FAULT_PAGE_GET                   0x000000008
-			 FAULT_ALLOC_BIO                  0x000000010 (obsolete)
-			 FAULT_ALLOC_NID                  0x000000020
-			 FAULT_ORPHAN                     0x000000040
-			 FAULT_BLOCK                      0x000000080
-			 FAULT_DIR_DEPTH                  0x000000100
-			 FAULT_EVICT_INODE                0x000000200
-			 FAULT_TRUNCATE                   0x000000400
-			 FAULT_READ_IO                    0x000000800
-			 FAULT_CHECKPOINT                 0x000001000
-			 FAULT_DISCARD                    0x000002000
-			 FAULT_WRITE_IO                   0x000004000
-			 FAULT_SLAB_ALLOC                 0x000008000
-			 FAULT_DQUOT_INIT                 0x000010000
-			 FAULT_LOCK_OP                    0x000020000
-			 FAULT_BLKADDR_VALIDITY           0x000040000
-			 FAULT_BLKADDR_CONSISTENCE        0x000080000
-			 FAULT_NO_SEGMENT                 0x000100000
-			 FAULT_INCONSISTENT_FOOTER        0x000200000
-			 ===========================      ===========
+			 ===========================      ==========
+			 FAULT_KMALLOC                    0x00000001
+			 FAULT_KVMALLOC                   0x00000002
+			 FAULT_PAGE_ALLOC                 0x00000004
+			 FAULT_PAGE_GET                   0x00000008
+			 FAULT_ALLOC_BIO                  0x00000010 (obsolete)
+			 FAULT_ALLOC_NID                  0x00000020
+			 FAULT_ORPHAN                     0x00000040
+			 FAULT_BLOCK                      0x00000080
+			 FAULT_DIR_DEPTH                  0x00000100
+			 FAULT_EVICT_INODE                0x00000200
+			 FAULT_TRUNCATE                   0x00000400
+			 FAULT_READ_IO                    0x00000800
+			 FAULT_CHECKPOINT                 0x00001000
+			 FAULT_DISCARD                    0x00002000
+			 FAULT_WRITE_IO                   0x00004000
+			 FAULT_SLAB_ALLOC                 0x00008000
+			 FAULT_DQUOT_INIT                 0x00010000
+			 FAULT_LOCK_OP                    0x00020000
+			 FAULT_BLKADDR_VALIDITY           0x00040000
+			 FAULT_BLKADDR_CONSISTENCE        0x00080000
+			 FAULT_NO_SEGMENT                 0x00100000
+			 FAULT_INCONSISTENT_FOOTER        0x00200000
+			 FAULT_TIMEOUT                    0x00400000 (1000ms)
+			 FAULT_VMALLOC                    0x00800000
+			 ===========================      ==========
 mode=%s			 Control block allocation mode which supports "adaptive"
 			 and "lfs". In "lfs" mode, there should be no random
 			 writes towards main area.
diff --git a/Documentation/filesystems/fuse-passthrough.rst b/Documentation/filesystems/fuse-passthrough.rst
new file mode 100644
index 000000000000..2b0e7c2da54a
--- /dev/null
+++ b/Documentation/filesystems/fuse-passthrough.rst
@@ -0,0 +1,133 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+FUSE Passthrough
+================
+
+Introduction
+============
+
+FUSE (Filesystem in Userspace) passthrough is a feature designed to improve the
+performance of FUSE filesystems for I/O operations. Typically, FUSE operations
+involve communication between the kernel and a userspace FUSE daemon, which can
+incur overhead. Passthrough allows certain operations on a FUSE file to bypass
+the userspace daemon and be executed directly by the kernel on an underlying
+"backing file".
+
+This is achieved by the FUSE daemon registering a file descriptor (pointing to
+the backing file on a lower filesystem) with the FUSE kernel module. The kernel
+then receives an identifier (``backing_id``) for this registered backing file.
+When a FUSE file is subsequently opened, the FUSE daemon can, in its response to
+the ``OPEN`` request, include this ``backing_id`` and set the
+``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific
+operations.
+
+Currently, passthrough is supported for operations like ``read(2)``/``write(2)``
+(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``.
+
+Enabling Passthrough
+====================
+
+To use FUSE passthrough:
+
+  1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH``
+     enabled.
+  2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the
+     ``FUSE_PASSTHROUGH`` capability and specify its desired
+     ``max_stack_depth``.
+  3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl
+     on its connection file descriptor (e.g., ``/dev/fuse``) to register a
+     backing file descriptor and obtain a ``backing_id``.
+  4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon
+     replies with the ``FOPEN_PASSTHROUGH`` flag set in
+     ``fuse_open_out::open_flags`` and provides the corresponding ``backing_id``
+     in ``fuse_open_out::backing_id``.
+  5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with
+     the ``backing_id`` to release the kernel's reference to the backing file
+     when it's no longer needed for passthrough setups.
+
+Privilege Requirements
+======================
+
+Setting up passthrough functionality currently requires the FUSE daemon to
+possess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several
+security and resource management considerations that are actively being
+discussed and worked on. The primary reasons for this restriction are detailed
+below.
+
+Resource Accounting and Visibility
+----------------------------------
+
+The core mechanism for passthrough involves the FUSE daemon opening a file
+descriptor to a backing file and registering it with the FUSE kernel module via
+the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id``
+associated with a kernel-internal ``struct fuse_backing`` object, which holds a
+reference to the backing ``struct file``.
+
+A significant concern arises because the FUSE daemon can close its own file
+descriptor to the backing file after registration. The kernel, however, will
+still hold a reference to the ``struct file`` via the ``struct fuse_backing``
+object as long as it's associated with a ``backing_id`` (or subsequently, with
+an open FUSE file in passthrough mode).
+
+This behavior leads to two main issues for unprivileged FUSE daemons:
+
+  1. **Invisibility to lsof and other inspection tools**: Once the FUSE
+     daemon closes its file descriptor, the open backing file held by the kernel
+     becomes "hidden." Standard tools like ``lsof``, which typically inspect
+     process file descriptor tables, would not be able to identify that this
+     file is still open by the system on behalf of the FUSE filesystem. This
+     makes it difficult for system administrators to track resource usage or
+     debug issues related to open files (e.g., preventing unmounts).
+
+  2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to
+     resource limits, including the maximum number of open file descriptors
+     (``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files
+     and then close its own FDs, it could potentially cause the kernel to hold
+     an unlimited number of open ``struct file`` references without these being
+     accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a
+     denial-of-service (DoS) by exhausting system-wide file resources.
+
+The ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues,
+restricting this powerful capability to trusted processes.
+
+**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files",
+which are visible via ``fdinfo`` and accounted under the registering user's
+``RLIMIT_NOFILE``.
+
+Filesystem Stacking and Shutdown Loops
+--------------------------------------
+
+Another concern relates to the potential for creating complex and problematic
+filesystem stacking scenarios if unprivileged users could set up passthrough.
+A FUSE passthrough filesystem might use a backing file that resides:
+
+  * On the *same* FUSE filesystem.
+  * On another filesystem (like OverlayFS) which itself might have an upper or
+    lower layer that is a FUSE filesystem.
+
+These configurations could create dependency loops, particularly during
+filesystem shutdown or unmount sequences, leading to deadlocks or system
+instability. This is conceptually similar to the risks associated with the
+``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``.
+
+To mitigate this, FUSE passthrough already incorporates checks based on
+filesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``).
+For example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate
+the ``max_stack_depth`` it supports. When a backing file is registered via
+``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's
+filesystem stack depth is within the allowed limit.
+
+The ``CAP_SYS_ADMIN`` requirement provides an additional layer of security,
+ensuring that only privileged users can create these potentially complex
+stacking arrangements.
+
+General Security Posture
+------------------------
+
+As a general principle for new kernel features that allow userspace to instruct
+the kernel to perform direct operations on its behalf based on user-provided
+file descriptors, starting with a higher privilege requirement (like
+``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows
+the feature to be used and tested while further security implications are
+evaluated and addressed.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 32618512a965..11a599387266 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -99,6 +99,7 @@ Documentation for filesystem implementations.
    fuse
    fuse-io
    fuse-io-uring
+   fuse-passthrough
    inotify
    isofs
    nilfs2
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index 939b4b624fad..ddd799df6ce3 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -712,11 +712,6 @@ handle falling back from one source type to another.  The members are:
      at a boundary with the filesystem structure (e.g. at the end of a Ceph
      object).  It tells netfslib not to retile subrequests across it.
 
-   * ``NETFS_SREQ_SEEK_DATA_READ``
-
-     This is a hint from netfslib to the cache that it might want to try
-     skipping ahead to the next data (ie. using SEEK_DATA).
-
  * ``error``
 
    This is for the filesystem to store result of the subrequest.  It should be
diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 2db379b4b31e..4133a336486d 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -443,6 +443,13 @@ Only the data of the files in the "data-only" lower layers may be visible
 when a "metacopy" file in one of the lower layers above it, has a "redirect"
 to the absolute path of the "lower data" file in the "data-only" lower layer.
 
+Instead of explicitly enabling "metacopy=on" it is sufficient to specify at
+least one data-only layer to enable redirection of data to a data-only layer.
+In this case other forms of metacopy are rejected.  Note: this way data-only
+layers may be used toghether with "userxattr", in which case careful attention
+must be given to privileges needed to change the "user.overlay.redirect" xattr
+to prevent misuse.
+
 Since kernel version v6.8, "data-only" lower layers can also be added using
 the "datadir+" mount options and the fsconfig syscall from new mount api.
 For example::
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 3111ef5592f3..a5734bdd1cc7 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1243,3 +1243,18 @@ arguments in the opposite order but is otherwise identical.
 
 Using try_lookup_noperm() will require linux/namei.h to be included.
 
+---
+
+**mandatory**
+
+Calling conventions for ->d_automount() have changed; we should *not* grab
+an extra reference to new mount - it should be returned with refcount 1.
+
+---
+
+collect_mounts()/drop_collected_mounts()/iterate_mounts() are gone now.
+Replacement is collect_paths()/drop_collected_path(), with no special
+iterator needed.  Instead of a cloned mount tree, the new interface returns
+an array of struct path, one for each mount collect_mounts() would've
+created.  These struct path point to locations in the caller's namespace
+that would be roots of the cloned mounts.
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 2a17865dfe39..5236cb52e357 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -584,7 +584,6 @@ encoded manner. The codes are the following:
     ms    may share
     gd    stack segment growns down
     pf    pure PFN range
-    dw    disabled write to the mapped file
     lo    pages are locked in memory
     io    memory mapped I/O area
     sr    sequential read advise provided
@@ -607,8 +606,11 @@ encoded manner. The codes are the following:
     mt    arm64 MTE allocation tags are enabled
     um    userfaultfd missing tracking
     uw    userfaultfd wr-protect tracking
+    ui    userfaultfd minor fault
     ss    shadow/guarded control stack page
     sl    sealed
+    lf    lock on fault pages
+    dp    always lazily freeable mapping
     ==    =======================================
 
 Note that there is no guarantee that every flag and associated mnemonic will
diff --git a/Documentation/filesystems/relay.rst b/Documentation/filesystems/relay.rst
index 46447dbc75ad..301ff4c6e6c6 100644
--- a/Documentation/filesystems/relay.rst
+++ b/Documentation/filesystems/relay.rst
@@ -301,16 +301,6 @@ user-defined data with a channel, and is immediately available
 (including in create_buf_file()) via chan->private_data or
 buf->chan->private_data.
 
-Buffer-only channels
---------------------
-
-These channels have no files associated and can be created with
-relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
-as when doing early tracing in the kernel, before the VFS is up. In these
-cases, one may open a buffer-only channel and then call
-relay_late_setup_files() when the kernel is ready to handle files,
-to expose the buffered data to the userspace.
-
 Channel 'modes'
 ---------------
 
diff --git a/Documentation/filesystems/smb/index.rst b/Documentation/filesystems/smb/index.rst
index 1c8597a679ab..6df23b0e45c8 100644
--- a/Documentation/filesystems/smb/index.rst
+++ b/Documentation/filesystems/smb/index.rst
@@ -8,3 +8,4 @@ CIFS
 
    ksmbd
    cifsroot
+   smbdirect
diff --git a/Documentation/filesystems/smb/smbdirect.rst b/Documentation/filesystems/smb/smbdirect.rst
new file mode 100644
index 000000000000..ca6927c0b2c0
--- /dev/null
+++ b/Documentation/filesystems/smb/smbdirect.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+SMB Direct - SMB3 over RDMA
+===========================
+
+This document describes how to set up the Linux SMB client and server to
+use RDMA.
+
+Overview
+========
+The Linux SMB kernel client supports SMB Direct, which is a transport
+scheme for SMB3 that uses RDMA (Remote Direct Memory Access) to provide
+high throughput and low latencies by bypassing the traditional TCP/IP
+stack.
+SMB Direct on the Linux SMB client can be tested against KSMBD - a
+kernel-space SMB server.
+
+Installation
+=============
+- Install an RDMA device. As long as the RDMA device driver is supported
+  by the kernel, it should work. This includes both software emulators (soft
+  RoCE, soft iWARP) and hardware devices (InfiniBand, RoCE, iWARP).
+
+- Install a kernel with SMB Direct support. The first kernel release to
+  support SMB Direct on both the client and server side is 5.15. Therefore,
+  a distribution compatible with kernel 5.15 or later is required.
+
+- Install cifs-utils, which provides the `mount.cifs` command to mount SMB
+  shares.
+
+- Configure the RDMA stack
+
+  Make sure that your kernel configuration has RDMA support enabled. Under
+  Device Drivers -> Infiniband support, update the kernel configuration to
+  enable Infiniband support.
+
+  Enable the appropriate IB HCA support or iWARP adapter support,
+  depending on your hardware.
+
+  If you are using InfiniBand, enable IP-over-InfiniBand support.
+
+  For soft RDMA, enable either the soft iWARP (`RDMA _SIW`) or soft RoCE
+  (`RDMA_RXE`) module. Install the `iproute2` package and use the
+  `rdma link add` command to load the module and create an
+  RDMA interface.
+
+  e.g. if your local ethernet interface is `eth0`, you can use:
+
+    .. code-block:: bash
+
+        sudo rdma link add siw0 type siw netdev eth0
+
+- Enable SMB Direct support for both the server and the client in the kernel
+  configuration.
+
+    Server Setup
+
+    .. code-block:: text
+
+        Network File Systems  --->
+            <M> SMB3 server support
+                [*] Support for SMB Direct protocol
+
+    Client Setup
+
+    .. code-block:: text
+
+        Network File Systems  --->
+            <M> SMB3 and CIFS support (advanced network filesystem)
+                [*] SMB Direct support
+
+- Build and install the kernel. SMB Direct support will be enabled in the
+  cifs.ko and ksmbd.ko modules.
+
+Setup and Usage
+================
+
+- Set up and start a KSMBD server as described in the `KSMBD documentation
+  <https://www.kernel.org/doc/Documentation/filesystems/smb/ksmbd.rst>`_.
+  Also add the "server multi channel support = yes" parameter to ksmbd.conf.
+
+- On the client, mount the share with `rdma` mount option to use SMB Direct
+  (specify a SMB version 3.0 or higher using `vers`).
+
+  For example:
+
+    .. code-block:: bash
+
+        mount -t cifs //server/share /mnt/point -o vers=3.1.1,rdma
+
+- To verify that the mount is using SMB Direct, you can check dmesg for the
+  following log line after mounting:
+
+    .. code-block:: text
+
+        CIFS: VFS: RDMA transport established
+
+  Or, verify `rdma` mount option for the share in `/proc/mounts`:
+
+    .. code-block:: bash
+
+        cat /proc/mounts | grep cifs
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index bf051c7da6b8..fd32a9a17bfb 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1390,9 +1390,7 @@ defined:
 
 	If a vfsmount is returned, the caller will attempt to mount it
 	on the mountpoint and will remove the vfsmount from its
-	expiration list in the case of failure.  The vfsmount should be
-	returned with 2 refs on it to prevent automatic expiration - the
-	caller will clean up the additional ref.
+	expiration list in the case of failure.
 
 	This function is only used if DCACHE_NEED_AUTOMOUNT is set on
 	the dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is