aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/api-summary.rst150
-rw-r--r--Documentation/filesystems/binderfs.rst68
-rw-r--r--Documentation/filesystems/ceph.txt14
-rw-r--r--Documentation/filesystems/exofs.txt185
-rw-r--r--Documentation/filesystems/fscrypt.rst16
-rw-r--r--Documentation/filesystems/index.rst389
-rw-r--r--Documentation/filesystems/journalling.rst184
-rw-r--r--Documentation/filesystems/mount_api.txt709
-rw-r--r--Documentation/filesystems/path-lookup.rst39
-rw-r--r--Documentation/filesystems/splice.rst22
-rw-r--r--Documentation/filesystems/sysfs.txt25
-rw-r--r--Documentation/filesystems/vfs.txt3
-rw-r--r--Documentation/filesystems/xfs.txt3
13 files changed, 1234 insertions, 573 deletions
diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst
new file mode 100644
index 000000000000..aa51ffcfa029
--- /dev/null
+++ b/Documentation/filesystems/api-summary.rst
@@ -0,0 +1,150 @@
+=============================
+Linux Filesystems API summary
+=============================
+
+This section contains API-level documentation, mostly taken from the source
+code itself.
+
+The Linux VFS
+=============
+
+The Filesystem types
+--------------------
+
+.. kernel-doc:: include/linux/fs.h
+ :internal:
+
+The Directory Cache
+-------------------
+
+.. kernel-doc:: fs/dcache.c
+ :export:
+
+.. kernel-doc:: include/linux/dcache.h
+ :internal:
+
+Inode Handling
+--------------
+
+.. kernel-doc:: fs/inode.c
+ :export:
+
+.. kernel-doc:: fs/bad_inode.c
+ :export:
+
+Registration and Superblocks
+----------------------------
+
+.. kernel-doc:: fs/super.c
+ :export:
+
+File Locks
+----------
+
+.. kernel-doc:: fs/locks.c
+ :export:
+
+.. kernel-doc:: fs/locks.c
+ :internal:
+
+Other Functions
+---------------
+
+.. kernel-doc:: fs/mpage.c
+ :export:
+
+.. kernel-doc:: fs/namei.c
+ :export:
+
+.. kernel-doc:: fs/buffer.c
+ :export:
+
+.. kernel-doc:: block/bio.c
+ :export:
+
+.. kernel-doc:: fs/seq_file.c
+ :export:
+
+.. kernel-doc:: fs/filesystems.c
+ :export:
+
+.. kernel-doc:: fs/fs-writeback.c
+ :export:
+
+.. kernel-doc:: fs/block_dev.c
+ :export:
+
+.. kernel-doc:: fs/anon_inodes.c
+ :export:
+
+.. kernel-doc:: fs/attr.c
+ :export:
+
+.. kernel-doc:: fs/d_path.c
+ :export:
+
+.. kernel-doc:: fs/dax.c
+ :export:
+
+.. kernel-doc:: fs/direct-io.c
+ :export:
+
+.. kernel-doc:: fs/file_table.c
+ :export:
+
+.. kernel-doc:: fs/libfs.c
+ :export:
+
+.. kernel-doc:: fs/posix_acl.c
+ :export:
+
+.. kernel-doc:: fs/stat.c
+ :export:
+
+.. kernel-doc:: fs/sync.c
+ :export:
+
+.. kernel-doc:: fs/xattr.c
+ :export:
+
+The proc filesystem
+===================
+
+sysctl interface
+----------------
+
+.. kernel-doc:: kernel/sysctl.c
+ :export:
+
+proc filesystem interface
+-------------------------
+
+.. kernel-doc:: fs/proc/base.c
+ :internal:
+
+Events based on file descriptors
+================================
+
+.. kernel-doc:: fs/eventfd.c
+ :export:
+
+The Filesystem for Exporting Kernel Objects
+===========================================
+
+.. kernel-doc:: fs/sysfs/file.c
+ :export:
+
+.. kernel-doc:: fs/sysfs/symlink.c
+ :export:
+
+The debugfs filesystem
+======================
+
+debugfs interface
+-----------------
+
+.. kernel-doc:: fs/debugfs/inode.c
+ :export:
+
+.. kernel-doc:: fs/debugfs/file.c
+ :export:
diff --git a/Documentation/filesystems/binderfs.rst b/Documentation/filesystems/binderfs.rst
new file mode 100644
index 000000000000..c009671f8434
--- /dev/null
+++ b/Documentation/filesystems/binderfs.rst
@@ -0,0 +1,68 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Android binderfs Filesystem
+===============================
+
+Android binderfs is a filesystem for the Android binder IPC mechanism. It
+allows to dynamically add and remove binder devices at runtime. Binder devices
+located in a new binderfs instance are independent of binder devices located in
+other binderfs instances. Mounting a new binderfs instance makes it possible
+to get a set of private binder devices.
+
+Mounting binderfs
+-----------------
+
+Android binderfs can be mounted with::
+
+ mkdir /dev/binderfs
+ mount -t binder binder /dev/binderfs
+
+at which point a new instance of binderfs will show up at ``/dev/binderfs``.
+In a fresh instance of binderfs no binder devices will be present. There will
+only be a ``binder-control`` device which serves as the request handler for
+binderfs. Mounting another binderfs instance at a different location will
+create a new and separate instance from all other binderfs mounts. This is
+identical to the behavior of e.g. ``devpts`` and ``tmpfs``. The Android
+binderfs filesystem can be mounted in user namespaces.
+
+Options
+-------
+max
+ binderfs instances can be mounted with a limit on the number of binder
+ devices that can be allocated. The ``max=<count>`` mount option serves as
+ a per-instance limit. If ``max=<count>`` is set then only ``<count>`` number
+ of binder devices can be allocated in this binderfs instance.
+
+Allocating binder Devices
+-------------------------
+
+.. _ioctl: http://man7.org/linux/man-pages/man2/ioctl.2.html
+
+To allocate a new binder device in a binderfs instance a request needs to be
+sent through the ``binder-control`` device node. A request is sent in the form
+of an `ioctl() <ioctl_>`_.
+
+What a program needs to do is to open the ``binder-control`` device node and
+send a ``BINDER_CTL_ADD`` request to the kernel. Users of binderfs need to
+tell the kernel which name the new binder device should get. By default a name
+can only contain up to ``BINDERFS_MAX_NAME`` chars including the terminating
+zero byte.
+
+Once the request is made via an `ioctl() <ioctl_>`_ passing a ``struct
+binder_device`` with the name to the kernel it will allocate a new binder
+device and return the major and minor number of the new device in the struct
+(This is necessary because binderfs allocates a major device number
+dynamically.). After the `ioctl() <ioctl_>`_ returns there will be a new
+binder device located under /dev/binderfs with the chosen name.
+
+Deleting binder Devices
+-----------------------
+
+.. _unlink: http://man7.org/linux/man-pages/man2/unlink.2.html
+.. _rm: http://man7.org/linux/man-pages/man1/rm.1.html
+
+Binderfs binder devices can be deleted via `unlink() <unlink_>`_. This means
+that the `rm() <rm_>`_ tool can be used to delete them. Note that the
+``binder-control`` device cannot be deleted since this would make the binderfs
+instance unuseable. The ``binder-control`` device will be deleted when the
+binderfs instance is unmounted and all references to it have been dropped.
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
index 1177052701e1..d2c6a5ccf0f5 100644
--- a/Documentation/filesystems/ceph.txt
+++ b/Documentation/filesystems/ceph.txt
@@ -22,9 +22,7 @@ In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
on symmetric access by all clients to shared block devices, Ceph
separates data and metadata management into independent server
clusters, similar to Lustre. Unlike Lustre, however, metadata and
-storage nodes run entirely as user space daemons. Storage nodes
-utilize btrfs to store data objects, leveraging its advanced features
-(checksumming, metadata replication, etc.). File data is striped
+storage nodes run entirely as user space daemons. File data is striped
across storage nodes in large chunks to distribute workload and
facilitate high throughputs. When storage nodes fail, data is
re-replicated in a distributed fashion by the storage nodes themselves
@@ -118,6 +116,10 @@ Mount Options
of a non-responsive Ceph file system. The default is 30
seconds.
+ caps_max=X
+ Specify the maximum number of caps to hold. Unused caps are released
+ when number of caps exceeds the limit. The default is 0 (no limit)
+
rbytes
When stat() is called on a directory, set st_size to 'rbytes',
the summation of file sizes over all files nested beneath that
@@ -160,11 +162,11 @@ More Information
================
For more information on Ceph, see the home page at
- http://ceph.newdream.net/
+ https://ceph.com/
The Linux kernel client source tree is available at
- git://ceph.newdream.net/git/ceph-client.git
+ https://github.com/ceph/ceph-client.git
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
and the source for the full system is at
- git://ceph.newdream.net/git/ceph.git
+ https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/exofs.txt b/Documentation/filesystems/exofs.txt
deleted file mode 100644
index 23583a136975..000000000000
--- a/Documentation/filesystems/exofs.txt
+++ /dev/null
@@ -1,185 +0,0 @@
-===============================================================================
-WHAT IS EXOFS?
-===============================================================================
-
-exofs is a file system that uses an OSD and exports the API of a normal Linux
-file system. Users access exofs like any other local file system, and exofs
-will in turn issue commands to the local OSD initiator.
-
-OSD is a new T10 command set that views storage devices not as a large/flat
-array of sectors but as a container of objects, each having a length, quota,
-time attributes and more. Each object is addressed by a 64bit ID, and is
-contained in a 64bit ID partition. Each object has associated attributes
-attached to it, which are integral part of the object and provide metadata about
-the object. The standard defines some common obligatory attributes, but user
-attributes can be added as needed.
-
-===============================================================================
-ENVIRONMENT
-===============================================================================
-
-To use this file system, you need to have an object store to run it on. You
-may download a target from:
-http://open-osd.org
-
-See Documentation/scsi/osd.txt for how to setup a working osd environment.
-
-===============================================================================
-USAGE
-===============================================================================
-
-1. Download and compile exofs and open-osd initiator:
- You need an external Kernel source tree or kernel headers from your
- distribution. (anything based on 2.6.26 or later).
-
- a. download open-osd including exofs source using:
- [parent-directory]$ git clone git://git.open-osd.org/open-osd.git
-
- b. Build the library module like this:
- [parent-directory]$ make -C KSRC=$(KER_DIR) open-osd
-
- This will build both the open-osd initiator as well as the exofs kernel
- module. Use whatever parameters you compiled your Kernel with and
- $(KER_DIR) above pointing to the Kernel you compile against. See the file
- open-osd/top-level-Makefile for an example.
-
-2. Get the OSD initiator and target set up properly, and login to the target.
- See Documentation/scsi/osd.txt for farther instructions. Also see ./do-osd
- for example script that does all these steps.
-
-3. Insmod the exofs.ko module:
- [exofs]$ insmod exofs.ko
-
-4. Make sure the directory where you want to mount exists. If not, create it.
- (For example, mkdir /mnt/exofs)
-
-5. At first run you will need to invoke the mkfs.exofs application
-
- As an example, this will create the file system on:
- /dev/osd0 partition ID 65536
-
- mkfs.exofs --pid=65536 --format /dev/osd0
-
- The --format is optional. If not specified, no OSD_FORMAT will be
- performed and a clean file system will be created in the specified pid,
- in the available space of the target. (Use --format=size_in_meg to limit
- the total LUN space available)
-
- If pid already exists, it will be deleted and a new one will be created in
- its place. Be careful.
-
- An exofs lives inside a single OSD partition. You can create multiple exofs
- filesystems on the same device using multiple pids.
-
- (run mkfs.exofs without any parameters for usage help message)
-
-6. Mount the file system.
-
- For example, to mount /dev/osd0, partition ID 0x10000 on /mnt/exofs:
-
- mount -t exofs -o pid=65536 /dev/osd0 /mnt/exofs/
-
-7. For reference (See do-exofs example script):
- do-exofs start - an example of how to perform the above steps.
- do-exofs stop - an example of how to unmount the file system.
- do-exofs format - an example of how to format and mkfs a new exofs.
-
-8. Extra compilation flags (uncomment in fs/exofs/Kbuild):
- CONFIG_EXOFS_DEBUG - for debug messages and extra checks.
-
-===============================================================================
-exofs mount options
-===============================================================================
-Similar to any mount command:
- mount -t exofs -o exofs_options /dev/osdX mount_exofs_directory
-
-Where:
- -t exofs: specifies the exofs file system
-
- /dev/osdX: X is a decimal number. /dev/osdX was created after a successful
- login into an OSD target.
-
- mount_exofs_directory: The directory to mount the file system on
-
- exofs specific options: Options are separated by commas (,)
- pid=<integer> - The partition number to mount/create as
- container of the filesystem.
- This option is mandatory. integer can be
- Hex by pre-pending an 0x to the number.
- osdname=<id> - Mount by a device's osdname.
- osdname is usually a 36 character uuid of the
- form "d2683732-c906-4ee1-9dbd-c10c27bb40df".
- It is one of the device's uuid specified in the
- mkfs.exofs format command.
- If this option is specified then the /dev/osdX
- above can be empty and is ignored.
- to=<integer> - Timeout in ticks for a single command.
- default is (60 * HZ) [for debugging only]
-
-===============================================================================
-DESIGN
-===============================================================================
-
-* The file system control block (AKA on-disk superblock) resides in an object
- with a special ID (defined in common.h).
- Information included in the file system control block is used to fill the
- in-memory superblock structure at mount time. This object is created before
- the file system is used by mkexofs.c. It contains information such as:
- - The file system's magic number
- - The next inode number to be allocated
-
-* Each file resides in its own object and contains the data (and it will be
- possible to extend the file over multiple objects, though this has not been
- implemented yet).
-
-* A directory is treated as a file, and essentially contains a list of <file
- name, inode #> pairs for files that are found in that directory. The object
- IDs correspond to the files' inode numbers and will be allocated according to
- a bitmap (stored in a separate object). Now they are allocated using a
- counter.
-
-* Each file's control block (AKA on-disk inode) is stored in its object's
- attributes. This applies to both regular files and other types (directories,
- device files, symlinks, etc.).
-
-* Credentials are generated per object (inode and superblock) when they are
- created in memory (read from disk or created). The credential works for all
- operations and is used as long as the object remains in memory.
-
-* Async OSD operations are used whenever possible, but the target may execute
- them out of order. The operations that concern us are create, delete,
- readpage, writepage, update_inode, and truncate. The following pairs of
- operations should execute in the order written, and we need to prevent them
- from executing in reverse order:
- - The following are handled with the OBJ_CREATED and OBJ_2BCREATED
- flags. OBJ_CREATED is set when we know the object exists on the OSD -
- in create's callback function, and when we successfully do a
- read_inode.
- OBJ_2BCREATED is set in the beginning of the create function, so we
- know that we should wait.
- - create/delete: delete should wait until the object is created
- on the OSD.
- - create/readpage: readpage should be able to return a page
- full of zeroes in this case. If there was a write already
- en-route (i.e. create, writepage, readpage) then the page
- would be locked, and so it would really be the same as
- create/writepage.
- - create/writepage: if writepage is called for a sync write, it
- should wait until the object is created on the OSD.
- Otherwise, it should just return.
- - create/truncate: truncate should wait until the object is
- created on the OSD.
- - create/update_inode: update_inode should wait until the
- object is created on the OSD.
- - Handled by VFS locks:
- - readpage/delete: shouldn't happen because of page lock.
- - writepage/delete: shouldn't happen because of page lock.
- - readpage/writepage: shouldn't happen because of page lock.
-
-===============================================================================
-LICENSE/COPYRIGHT
-===============================================================================
-The exofs file system is based on ext2 v0.5b (distributed with the Linux kernel
-version 2.6.10). All files include the original copyrights, and the license
-is GPL version 2 (only version 2, as is true for the Linux kernel). The
-Linux kernel can be downloaded from www.kernel.org.
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 3a7b60521b94..08c23b60e016 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -343,9 +343,9 @@ FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory
- ``ENOTTY``: this type of filesystem does not implement encryption
- ``EOPNOTSUPP``: the kernel was not configured with encryption
- support for this filesystem, or the filesystem superblock has not
+ support for filesystems, or the filesystem superblock has not
had encryption enabled on it. (For example, to use encryption on an
- ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the
+ ext4 filesystem, CONFIG_FS_ENCRYPTION must be enabled in the
kernel config, and the superblock must have had the "encrypt"
feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O
encrypt``.)
@@ -451,10 +451,18 @@ astute users may notice some differences in behavior:
- Unencrypted files, or files encrypted with a different encryption
policy (i.e. different key, modes, or flags), cannot be renamed or
linked into an encrypted directory; see `Encryption policy
- enforcement`_. Attempts to do so will fail with EPERM. However,
+ enforcement`_. Attempts to do so will fail with EXDEV. However,
encrypted files can be renamed within an encrypted directory, or
into an unencrypted directory.
+ Note: "moving" an unencrypted file into an encrypted directory, e.g.
+ with the `mv` program, is implemented in userspace by a copy
+ followed by a delete. Be aware that the original unencrypted data
+ may remain recoverable from free space on the disk; prefer to keep
+ all files encrypted from the very beginning. The `shred` program
+ may be used to overwrite the source files but isn't guaranteed to be
+ effective on all filesystems and storage devices.
+
- Direct I/O is not supported on encrypted files. Attempts to use
direct I/O on such files will fall back to buffered I/O.
@@ -541,7 +549,7 @@ not be encrypted.
Except for those special files, it is forbidden to have unencrypted
files, or files encrypted with a different encryption policy, in an
encrypted directory tree. Attempts to link or rename such a file into
-an encrypted directory will fail with EPERM. This is also enforced
+an encrypted directory will fail with EXDEV. This is also enforced
during ->lookup() to provide limited protection against offline
attacks that try to disable or downgrade encryption in known locations
where applications may later write sensitive data. It is recommended
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 605befab300b..1131c34d77f6 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -1,382 +1,43 @@
-=====================
-Linux Filesystems API
-=====================
+===============================
+Filesystems in the Linux kernel
+===============================
-The Linux VFS
-=============
+This under-development manual will, some glorious day, provide
+comprehensive information on how the Linux virtual filesystem (VFS) layer
+works, along with the filesystems that sit below it. For now, what we have
+can be found below.
-The Filesystem types
---------------------
-
-.. kernel-doc:: include/linux/fs.h
- :internal:
-
-The Directory Cache
--------------------
-
-.. kernel-doc:: fs/dcache.c
- :export:
-
-.. kernel-doc:: include/linux/dcache.h
- :internal:
-
-Inode Handling
---------------
-
-.. kernel-doc:: fs/inode.c
- :export:
-
-.. kernel-doc:: fs/bad_inode.c
- :export:
-
-Registration and Superblocks
-----------------------------
-
-.. kernel-doc:: fs/super.c
- :export:
-
-File Locks
-----------
-
-.. kernel-doc:: fs/locks.c
- :export:
-
-.. kernel-doc:: fs/locks.c
- :internal:
-
-Other Functions
----------------
-
-.. kernel-doc:: fs/mpage.c
- :export:
-
-.. kernel-doc:: fs/namei.c
- :export:
-
-.. kernel-doc:: fs/buffer.c
- :export:
-
-.. kernel-doc:: block/bio.c
- :export:
-
-.. kernel-doc:: fs/seq_file.c
- :export:
-
-.. kernel-doc:: fs/filesystems.c
- :export:
-
-.. kernel-doc:: fs/fs-writeback.c
- :export:
-
-.. kernel-doc:: fs/block_dev.c
- :export:
-
-.. kernel-doc:: fs/anon_inodes.c
- :export:
-
-.. kernel-doc:: fs/attr.c
- :export:
-
-.. kernel-doc:: fs/d_path.c
- :export:
-
-.. kernel-doc:: fs/dax.c
- :export:
-
-.. kernel-doc:: fs/direct-io.c
- :export:
-
-.. kernel-doc:: fs/file_table.c
- :export:
-
-.. kernel-doc:: fs/libfs.c
- :export:
-
-.. kernel-doc:: fs/posix_acl.c
- :export:
-
-.. kernel-doc:: fs/stat.c
- :export:
-
-.. kernel-doc:: fs/sync.c
- :export:
-
-.. kernel-doc:: fs/xattr.c
- :export:
-
-The proc filesystem
-===================
-
-sysctl interface
-----------------
-
-.. kernel-doc:: kernel/sysctl.c
- :export:
-
-proc filesystem interface
--------------------------
-
-.. kernel-doc:: fs/proc/base.c
- :internal:
-
-Events based on file descriptors
-================================
-
-.. kernel-doc:: fs/eventfd.c
- :export:
-
-The Filesystem for Exporting Kernel Objects
-===========================================
-
-.. kernel-doc:: fs/sysfs/file.c
- :export:
-
-.. kernel-doc:: fs/sysfs/symlink.c
- :export:
-
-The debugfs filesystem
+Core VFS documentation
======================
-debugfs interface
------------------
+See these manuals for documentation about the VFS layer itself and how its
+algorithms work.
-.. kernel-doc:: fs/debugfs/inode.c
- :export:
+.. toctree::
+ :maxdepth: 2
-.. kernel-doc:: fs/debugfs/file.c
- :export:
+ path-lookup.rst
+ api-summary
+ splice
-The Linux Journalling API
+Filesystem support layers
=========================
-Overview
---------
-
-Details
-~~~~~~~
-
-The journalling layer is easy to use. You need to first of all create a
-journal_t data structure. There are two calls to do this dependent on
-how you decide to allocate the physical media on which the journal
-resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
-filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
-for journal stored on a raw device (in a continuous range of blocks). A
-journal_t is a typedef for a struct pointer, so when you are finally
-finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
-any used kernel memory.
-
-Once you have got your journal_t object you need to 'mount' or load the
-journal file. The journalling layer expects the space for the journal
-was already allocated and initialized properly by the userspace tools.
-When loading the journal you must call :c:func:`jbd2_journal_load` to process
-journal contents. If the client file system detects the journal contents
-does not need to be processed (or even need not have valid contents), it
-may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
-calling :c:func:`jbd2_journal_load`.
-
-Note that jbd2_journal_wipe(..,0) calls
-:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
-transactions in the journal and similarly :c:func:`jbd2_journal_load` will
-call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
-:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
-
-Now you can go ahead and start modifying the underlying filesystem.
-Almost.
-
-You still need to actually journal your filesystem changes, this is done
-by wrapping them into transactions. Additionally you also need to wrap
-the modification of each of the buffers with calls to the journal layer,
-so it knows what the modifications you are actually making are. To do
-this use :c:func:`jbd2_journal_start` which returns a transaction handle.
-
-:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
-which indicates the end of a transaction are nestable calls, so you can
-reenter a transaction if necessary, but remember you must call
-:c:func:`jbd2_journal_stop` the same number of times as
-:c:func:`jbd2_journal_start` before the transaction is completed (or more
-accurately leaves the update phase). Ext4/VFS makes use of this feature to
-simplify handling of inode dirtying, quota support, etc.
-
-Inside each transaction you need to wrap the modifications to the
-individual buffers (blocks). Before you start to modify a buffer you
-need to call :c:func:`jbd2_journal_get_create_access()` /
-:c:func:`jbd2_journal_get_write_access()` /
-:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
-journalling layer to copy the unmodified
-data if it needs to. After all the buffer may be part of a previously
-uncommitted transaction. At this point you are at last ready to modify a
-buffer, and once you are have done so you need to call
-:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
-buffer you now know is now longer required to be pushed back on the
-device you can call :c:func:`jbd2_journal_forget` in much the same way as you
-might have used :c:func:`bforget` in the past.
-
-A :c:func:`jbd2_journal_flush` may be called at any time to commit and
-checkpoint all your transactions.
-
-Then at umount time , in your :c:func:`put_super` you can then call
-:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
-
-Unfortunately there a couple of ways the journal layer can cause a
-deadlock. The first thing to note is that each task can only have a
-single outstanding transaction at any one time, remember nothing commits
-until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
-the transaction at the end of each file/inode/address etc. operation you
-perform, so that the journalling system isn't re-entered on another
-journal. Since transactions can't be nested/batched across differing
-journals, and another filesystem other than yours (say ext4) may be
-modified in a later syscall.
-
-The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
-if there isn't enough space in the journal for your transaction (based
-on the passed nblocks param) - when it blocks it merely(!) needs to wait
-for transactions to complete and be committed from other tasks, so
-essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
-deadlocks you must treat :c:func:`jbd2_journal_start` /
-:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
-your semaphore ordering rules to prevent
-deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
-behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
-easily as on :c:func:`jbd2_journal_start`.
-
-Try to reserve the right number of blocks the first time. ;-). This will
-be the maximum number of blocks you are going to touch in this
-transaction. I advise having a look at at least ext4_jbd.h to see the
-basis on which ext4 uses to make these decisions.
-
-Another wriggle to watch out for is your on-disk block allocation
-strategy. Why? Because, if you do a delete, you need to ensure you
-haven't reused any of the freed blocks until the transaction freeing
-these blocks commits. If you reused these blocks and crash happens,
-there is no way to restore the contents of the reallocated blocks at the
-end of the last fully committed transaction. One simple way of doing
-this is to mark blocks as free in internal in-memory block allocation
-structures only after the transaction freeing them commits. Ext4 uses
-journal commit callback for this purpose.
-
-With journal commit callbacks you can ask the journalling layer to call
-a callback function when the transaction is finally committed to disk,
-so that you can do some of your own management. You ask the journalling
-layer for calling the callback by simply setting
-``journal->j_commit_callback`` function pointer and that function is
-called after each transaction commit. You can also use
-``transaction->t_private_list`` for attaching entries to a transaction
-that need processing when the transaction commits.
-
-JBD2 also provides a way to block all transaction updates via
-:c:func:`jbd2_journal_lock_updates()` /
-:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
-window with a clean and stable fs for a moment. E.g.
-
-::
-
-
- jbd2_journal_lock_updates() //stop new stuff happening..
- jbd2_journal_flush() // checkpoint everything.
- ..do stuff on stable fs
- jbd2_journal_unlock_updates() // carry on with filesystem use.
-
-The opportunities for abuse and DOS attacks with this should be obvious,
-if you allow unprivileged userspace to trigger codepaths containing
-these calls.
-
-Summary
-~~~~~~~
-
-Using the journal is a matter of wrapping the different context changes,
-being each mount, each modification (transaction) and each changed
-buffer to tell the journalling layer about them.
-
-Data Types
-----------
-
-The journalling layer uses typedefs to 'hide' the concrete definitions
-of the structures used. As a client of the JBD2 layer you can just rely
-on the using the pointer as a magic cookie of some sort. Obviously the
-hiding is not enforced as this is 'C'.
-
-Structures
-~~~~~~~~~~
-
-.. kernel-doc:: include/linux/jbd2.h
- :internal:
-
-Functions
----------
-
-The functions here are split into two groups those that affect a journal
-as a whole, and those which are used to manage transactions
-
-Journal Level
-~~~~~~~~~~~~~
-
-.. kernel-doc:: fs/jbd2/journal.c
- :export:
-
-.. kernel-doc:: fs/jbd2/recovery.c
- :internal:
-
-Transasction Level
-~~~~~~~~~~~~~~~~~~
-
-.. kernel-doc:: fs/jbd2/transaction.c
-
-See also
---------
-
-`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
-Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
-
-`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
-Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
-
-splice API
-==========
-
-splice is a method for moving blocks of data around inside the kernel,
-without continually transferring them between the kernel and user space.
-
-.. kernel-doc:: fs/splice.c
-
-pipes API
-=========
-
-Pipe interfaces are all for in-kernel (builtin image) use. They are not
-exported for use by modules.
-
-.. kernel-doc:: include/linux/pipe_fs_i.h
- :internal:
-
-.. kernel-doc:: fs/pipe.c
-
-Encryption API
-==============
-
-A library which filesystems can hook into to support transparent
-encryption of files and directories.
+Documentation for the support code within the filesystem layer for use in
+filesystem implementations.
.. toctree::
- :maxdepth: 2
-
- fscrypt
-
-Pathname lookup
-===============
-
-
-This write-up is based on three articles published at lwn.net:
+ :maxdepth: 2
-- <https://lwn.net/Articles/649115/> Pathname lookup in Linux
-- <https://lwn.net/Articles/649729/> RCU-walk: faster pathname lookup in Linux
-- <https://lwn.net/Articles/650786/> A walk among the symlinks
+ journalling
+ fscrypt
-Written by Neil Brown with help from Al Viro and Jon Corbet.
-It has subsequently been updated to reflect changes in the kernel
-including:
+Filesystem-specific documentation
+=================================
-- per-directory parallel name lookup.
+Documentation for individual filesystem types can be found here.
.. toctree::
:maxdepth: 2
- path-lookup.rst
+ binderfs.rst
diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst
new file mode 100644
index 000000000000..58ce6b395206
--- /dev/null
+++ b/Documentation/filesystems/journalling.rst
@@ -0,0 +1,184 @@
+The Linux Journalling API
+=========================
+
+Overview
+--------
+
+Details
+~~~~~~~
+
+The journalling layer is easy to use. You need to first of all create a
+journal_t data structure. There are two calls to do this dependent on
+how you decide to allocate the physical media on which the journal
+resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
+filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
+for journal stored on a raw device (in a continuous range of blocks). A
+journal_t is a typedef for a struct pointer, so when you are finally
+finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
+any used kernel memory.
+
+Once you have got your journal_t object you need to 'mount' or load the
+journal file. The journalling layer expects the space for the journal
+was already allocated and initialized properly by the userspace tools.
+When loading the journal you must call :c:func:`jbd2_journal_load` to process
+journal contents. If the client file system detects the journal contents
+does not need to be processed (or even need not have valid contents), it
+may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
+calling :c:func:`jbd2_journal_load`.
+
+Note that jbd2_journal_wipe(..,0) calls
+:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
+transactions in the journal and similarly :c:func:`jbd2_journal_load` will
+call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
+:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
+
+Now you can go ahead and start modifying the underlying filesystem.
+Almost.
+
+You still need to actually journal your filesystem changes, this is done
+by wrapping them into transactions. Additionally you also need to wrap
+the modification of each of the buffers with calls to the journal layer,
+so it knows what the modifications you are actually making are. To do
+this use :c:func:`jbd2_journal_start` which returns a transaction handle.
+
+:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
+which indicates the end of a transaction are nestable calls, so you can
+reenter a transaction if necessary, but remember you must call
+:c:func:`jbd2_journal_stop` the same number of times as
+:c:func:`jbd2_journal_start` before the transaction is completed (or more
+accurately leaves the update phase). Ext4/VFS makes use of this feature to
+simplify handling of inode dirtying, quota support, etc.
+
+Inside each transaction you need to wrap the modifications to the
+individual buffers (blocks). Before you start to modify a buffer you
+need to call :c:func:`jbd2_journal_get_create_access()` /
+:c:func:`jbd2_journal_get_write_access()` /
+:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
+journalling layer to copy the unmodified
+data if it needs to. After all the buffer may be part of a previously
+uncommitted transaction. At this point you are at last ready to modify a
+buffer, and once you are have done so you need to call
+:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
+buffer you now know is now longer required to be pushed back on the
+device you can call :c:func:`jbd2_journal_forget` in much the same way as you
+might have used :c:func:`bforget` in the past.
+
+A :c:func:`jbd2_journal_flush` may be called at any time to commit and
+checkpoint all your transactions.
+
+Then at umount time , in your :c:func:`put_super` you can then call
+:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
+
+Unfortunately there a couple of ways the journal layer can cause a
+deadlock. The first thing to note is that each task can only have a
+single outstanding transaction at any one time, remember nothing commits
+until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
+the transaction at the end of each file/inode/address etc. operation you
+perform, so that the journalling system isn't re-entered on another
+journal. Since transactions can't be nested/batched across differing
+journals, and another filesystem other than yours (say ext4) may be
+modified in a later syscall.
+
+The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
+if there isn't enough space in the journal for your transaction (based
+on the passed nblocks param) - when it blocks it merely(!) needs to wait
+for transactions to complete and be committed from other tasks, so
+essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
+deadlocks you must treat :c:func:`jbd2_journal_start` /
+:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
+your semaphore ordering rules to prevent
+deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
+behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
+easily as on :c:func:`jbd2_journal_start`.
+
+Try to reserve the right number of blocks the first time. ;-). This will
+be the maximum number of blocks you are going to touch in this
+transaction. I advise having a look at at least ext4_jbd.h to see the
+basis on which ext4 uses to make these decisions.
+
+Another wriggle to watch out for is your on-disk block allocation
+strategy. Why? Because, if you do a delete, you need to ensure you
+haven't reused any of the freed blocks until the transaction freeing
+these blocks commits. If you reused these blocks and crash happens,
+there is no way to restore the contents of the reallocated blocks at the
+end of the last fully committed transaction. One simple way of doing
+this is to mark blocks as free in internal in-memory block allocation
+structures only after the transaction freeing them commits. Ext4 uses
+journal commit callback for this purpose.
+
+With journal commit callbacks you can ask the journalling layer to call
+a callback function when the transaction is finally committed to disk,
+so that you can do some of your own management. You ask the journalling
+layer for calling the callback by simply setting
+``journal->j_commit_callback`` function pointer and that function is
+called after each transaction commit. You can also use
+``transaction->t_private_list`` for attaching entries to a transaction
+that need processing when the transaction commits.
+
+JBD2 also provides a way to block all transaction updates via
+:c:func:`jbd2_journal_lock_updates()` /
+:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
+window with a clean and stable fs for a moment. E.g.
+
+::
+
+
+ jbd2_journal_lock_updates() //stop new stuff happening..
+ jbd2_journal_flush() // checkpoint everything.
+ ..do stuff on stable fs
+ jbd2_journal_unlock_updates() // carry on with filesystem use.
+
+The opportunities for abuse and DOS attacks with this should be obvious,
+if you allow unprivileged userspace to trigger codepaths containing
+these calls.
+
+Summary
+~~~~~~~
+
+Using the journal is a matter of wrapping the different context changes,
+being each mount, each modification (transaction) and each changed
+buffer to tell the journalling layer about them.
+
+Data Types
+----------
+
+The journalling layer uses typedefs to 'hide' the concrete definitions
+of the structures used. As a client of the JBD2 layer you can just rely
+on the using the pointer as a magic cookie of some sort. Obviously the
+hiding is not enforced as this is 'C'.
+
+Structures
+~~~~~~~~~~
+
+.. kernel-doc:: include/linux/jbd2.h
+ :internal:
+
+Functions
+---------
+
+The functions here are split into two groups those that affect a journal
+as a whole, and those which are used to manage transactions
+
+Journal Level
+~~~~~~~~~~~~~
+
+.. kernel-doc:: fs/jbd2/journal.c
+ :export:
+
+.. kernel-doc:: fs/jbd2/recovery.c
+ :internal:
+
+Transasction Level
+~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: fs/jbd2/transaction.c
+
+See also
+--------
+
+`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
+Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
+
+`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
+Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
+
diff --git a/Documentation/filesystems/mount_api.txt b/Documentation/filesystems/mount_api.txt
new file mode 100644
index 000000000000..944d1965e917
--- /dev/null
+++ b/Documentation/filesystems/mount_api.txt
@@ -0,0 +1,709 @@
+ ====================
+ FILESYSTEM MOUNT API
+ ====================
+
+CONTENTS
+
+ (1) Overview.
+
+ (2) The filesystem context.
+
+ (3) The filesystem context operations.
+
+ (4) Filesystem context security.
+
+ (5) VFS filesystem context operations.
+
+ (6) Parameter description.
+
+ (7) Parameter helper functions.
+
+
+========
+OVERVIEW
+========
+
+The creation of new mounts is now to be done in a multistep process:
+
+ (1) Create a filesystem context.
+
+ (2) Parse the parameters and attach them to the context. Parameters are
+ expected to be passed individually from userspace, though legacy binary
+ parameters can also be handled.
+
+ (3) Validate and pre-process the context.
+
+ (4) Get or create a superblock and mountable root.
+
+ (5) Perform the mount.
+
+ (6) Return an error message attached to the context.
+
+ (7) Destroy the context.
+
+To support this, the file_system_type struct gains a new field:
+
+ int (*init_fs_context)(struct fs_context *fc);
+
+which is invoked to set up the filesystem-specific parts of a filesystem
+context, including the additional space.
+
+Note that security initialisation is done *after* the filesystem is called so
+that the namespaces may be adjusted first.
+
+
+======================
+THE FILESYSTEM CONTEXT
+======================
+
+The creation and reconfiguration of a superblock is governed by a filesystem
+context. This is represented by the fs_context structure:
+
+ struct fs_context {
+ const struct fs_context_operations *ops;
+ struct file_system_type *fs_type;
+ void *fs_private;
+ struct dentry *root;
+ struct user_namespace *user_ns;
+ struct net *net_ns;
+ const struct cred *cred;
+ char *source;
+ char *subtype;
+ void *security;
+ void *s_fs_info;
+ unsigned int sb_flags;
+ unsigned int sb_flags_mask;
+ enum fs_context_purpose purpose:8;
+ bool sloppy:1;
+ bool silent:1;
+ ...
+ };
+
+The fs_context fields are as follows:
+
+ (*) const struct fs_context_operations *ops
+
+ These are operations that can be done on a filesystem context (see
+ below). This must be set by the ->init_fs_context() file_system_type
+ operation.
+
+ (*) struct file_system_type *fs_type
+
+ A pointer to the file_system_type of the filesystem that is being
+ constructed or reconfigured. This retains a reference on the type owner.
+
+ (*) void *fs_private
+
+ A pointer to the file system's private data. This is where the filesystem
+ will need to store any options it parses.
+
+ (*) struct dentry *root
+
+ A pointer to the root of the mountable tree (and indirectly, the
+ superblock thereof). This is filled in by the ->get_tree() op. If this
+ is set, an active reference on root->d_sb must also be held.
+
+ (*) struct user_namespace *user_ns
+ (*) struct net *net_ns
+
+ There are a subset of the namespaces in use by the invoking process. They
+ retain references on each namespace. The subscribed namespaces may be
+ replaced by the filesystem to reflect other sources, such as the parent
+ mount superblock on an automount.
+
+ (*) const struct cred *cred
+
+ The mounter's credentials. This retains a reference on the credentials.
+
+ (*) char *source
+
+ This specifies the source. It may be a block device (e.g. /dev/sda1) or
+ something more exotic, such as the "host:/path" that NFS desires.
+
+ (*) char *subtype
+
+ This is a string to be added to the type displayed in /proc/mounts to
+ qualify it (used by FUSE). This is available for the filesystem to set if
+ desired.
+
+ (*) void *security
+
+ A place for the LSMs to hang their security data for the superblock. The
+ relevant security operations are described below.
+
+ (*) void *s_fs_info
+
+ The proposed s_fs_info for a new superblock, set in the superblock by
+ sget_fc(). This can be used to distinguish superblocks.
+
+ (*) unsigned int sb_flags
+ (*) unsigned int sb_flags_mask
+
+ Which bits SB_* flags are to be set/cleared in super_block::s_flags.
+
+ (*) enum fs_context_purpose
+
+ This indicates the purpose for which the context is intended. The
+ available values are:
+
+ FS_CONTEXT_FOR_MOUNT, -- New superblock for explicit mount
+ FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount
+ FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount
+
+ (*) bool sloppy
+ (*) bool silent
+
+ These are set if the sloppy or silent mount options are given.
+
+ [NOTE] sloppy is probably unnecessary when userspace passes over one
+ option at a time since the error can just be ignored if userspace deems it
+ to be unimportant.
+
+ [NOTE] silent is probably redundant with sb_flags & SB_SILENT.
+
+The mount context is created by calling vfs_new_fs_context() or
+vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the
+structure is not refcounted.
+
+VFS, security and filesystem mount options are set individually with
+vfs_parse_mount_option(). Options provided by the old mount(2) system call as
+a page of data can be parsed with generic_parse_monolithic().
+
+When mounting, the filesystem is allowed to take data from any of the pointers
+and attach it to the superblock (or whatever), provided it clears the pointer
+in the mount context.
+
+The filesystem is also allowed to allocate resources and pin them with the
+mount context. For instance, NFS might pin the appropriate protocol version
+module.
+
+
+=================================
+THE FILESYSTEM CONTEXT OPERATIONS
+=================================
+
+The filesystem context points to a table of operations:
+
+ struct fs_context_operations {
+ void (*free)(struct fs_context *fc);
+ int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+ int (*parse_param)(struct fs_context *fc,
+ struct struct fs_parameter *param);
+ int (*parse_monolithic)(struct fs_context *fc, void *data);
+ int (*get_tree)(struct fs_context *fc);
+ int (*reconfigure)(struct fs_context *fc);
+ };
+
+These operations are invoked by the various stages of the mount procedure to
+manage the filesystem context. They are as follows:
+
+ (*) void (*free)(struct fs_context *fc);
+
+ Called to clean up the filesystem-specific part of the filesystem context
+ when the context is destroyed. It should be aware that parts of the
+ context may have been removed and NULL'd out by ->get_tree().
+
+ (*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+
+ Called when a filesystem context has been duplicated to duplicate the
+ filesystem-private data. An error may be returned to indicate failure to
+ do this.
+
+ [!] Note that even if this fails, put_fs_context() will be called
+ immediately thereafter, so ->dup() *must* make the
+ filesystem-private data safe for ->free().
+
+ (*) int (*parse_param)(struct fs_context *fc,
+ struct struct fs_parameter *param);
+
+ Called when a parameter is being added to the filesystem context. param
+ points to the key name and maybe a value object. VFS-specific options
+ will have been weeded out and fc->sb_flags updated in the context.
+ Security options will also have been weeded out and fc->security updated.
+
+ The parameter can be parsed with fs_parse() and fs_lookup_param(). Note
+ that the source(s) are presented as parameters named "source".
+
+ If successful, 0 should be returned or a negative error code otherwise.
+
+ (*) int (*parse_monolithic)(struct fs_context *fc, void *data);
+
+ Called when the mount(2) system call is invoked to pass the entire data
+ page in one go. If this is expected to be just a list of "key[=val]"
+ items separated by commas, then this may be set to NULL.
+
+ The return value is as for ->parse_param().
+
+ If the filesystem (e.g. NFS) needs to examine the data first and then
+ finds it's the standard key-val list then it may pass it off to
+ generic_parse_monolithic().
+
+ (*) int (*get_tree)(struct fs_context *fc);
+
+ Called to get or create the mountable root and superblock, using the
+ information stored in the filesystem context (reconfiguration goes via a
+ different vector). It may detach any resources it desires from the
+ filesystem context and transfer them to the superblock it creates.
+
+ On success it should set fc->root to the mountable root and return 0. In
+ the case of an error, it should return a negative error code.
+
+ The phase on a userspace-driven context will be set to only allow this to
+ be called once on any particular context.
+
+ (*) int (*reconfigure)(struct fs_context *fc);
+
+ Called to effect reconfiguration of a superblock using information stored
+ in the filesystem context. It may detach any resources it desires from
+ the filesystem context and transfer them to the superblock. The
+ superblock can be found from fc->root->d_sb.
+
+ On success it should return 0. In the case of an error, it should return
+ a negative error code.
+
+ [NOTE] reconfigure is intended as a replacement for remount_fs.
+
+
+===========================
+FILESYSTEM CONTEXT SECURITY
+===========================
+
+The filesystem context contains a security pointer that the LSMs can use for
+building up a security context for the superblock to be mounted. There are a
+number of operations used by the new mount code for this purpose:
+
+ (*) int security_fs_context_alloc(struct fs_context *fc,
+ struct dentry *reference);
+
+ Called to initialise fc->security (which is preset to NULL) and allocate
+ any resources needed. It should return 0 on success or a negative error
+ code on failure.
+
+ reference will be non-NULL if the context is being created for superblock
+ reconfiguration (FS_CONTEXT_FOR_RECONFIGURE) in which case it indicates
+ the root dentry of the superblock to be reconfigured. It will also be
+ non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case
+ it indicates the automount point.
+
+ (*) int security_fs_context_dup(struct fs_context *fc,
+ struct fs_context *src_fc);
+
+ Called to initialise fc->security (which is preset to NULL) and allocate
+ any resources needed. The original filesystem context is pointed to by
+ src_fc and may be used for reference. It should return 0 on success or a
+ negative error code on failure.
+
+ (*) void security_fs_context_free(struct fs_context *fc);
+
+ Called to clean up anything attached to fc->security. Note that the
+ contents may have been transferred to a superblock and the pointer cleared
+ during get_tree.
+
+ (*) int security_fs_context_parse_param(struct fs_context *fc,
+ struct fs_parameter *param);
+
+ Called for each mount parameter, including the source. The arguments are
+ as for the ->parse_param() method. It should return 0 to indicate that
+ the parameter should be passed on to the filesystem, 1 to indicate that
+ the parameter should be discarded or an error to indicate that the
+ parameter should be rejected.
+
+ The value pointed to by param may be modified (if a string) or stolen
+ (provided the value pointer is NULL'd out). If it is stolen, 1 must be
+ returned to prevent it being passed to the filesystem.
+
+ (*) int security_fs_context_validate(struct fs_context *fc);
+
+ Called after all the options have been parsed to validate the collection
+ as a whole and to do any necessary allocation so that
+ security_sb_get_tree() and security_sb_reconfigure() are less likely to
+ fail. It should return 0 or a negative error code.
+
+ In the case of reconfiguration, the target superblock will be accessible
+ via fc->root.
+
+ (*) int security_sb_get_tree(struct fs_context *fc);
+
+ Called during the mount procedure to verify that the specified superblock
+ is allowed to be mounted and to transfer the security data there. It
+ should return 0 or a negative error code.
+
+ (*) void security_sb_reconfigure(struct fs_context *fc);
+
+ Called to apply any reconfiguration to an LSM's context. It must not
+ fail. Error checking and resource allocation must be done in advance by
+ the parameter parsing and validation hooks.
+
+ (*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+ unsigned int mnt_flags);
+
+ Called during the mount procedure to verify that the root dentry attached
+ to the context is permitted to be attached to the specified mountpoint.
+ It should return 0 on success or a negative error code on failure.
+
+
+=================================
+VFS FILESYSTEM CONTEXT OPERATIONS
+=================================
+
+There are four operations for creating a filesystem context and
+one for destroying a context:
+
+ (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+ struct dentry *reference,
+ unsigned int sb_flags,
+ unsigned int sb_flags_mask,
+ enum fs_context_purpose purpose);
+
+ Create a filesystem context for a given filesystem type and purpose. This
+ allocates the filesystem context, sets the superblock flags, initialises
+ the security and calls fs_type->init_fs_context() to initialise the
+ filesystem private data.
+
+ reference can be NULL or it may indicate the root dentry of a superblock
+ that is going to be reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or
+ the automount point that triggered a submount (FS_CONTEXT_FOR_SUBMOUNT).
+ This is provided as a source of namespace information.
+
+ (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
+
+ Duplicate a filesystem context, copying any options noted and duplicating
+ or additionally referencing any resources held therein. This is available
+ for use where a filesystem has to get a mount within a mount, such as NFS4
+ does by internally mounting the root of the target server and then doing a
+ private pathwalk to the target directory.
+
+ The purpose in the new context is inherited from the old one.
+
+ (*) void put_fs_context(struct fs_context *fc);
+
+ Destroy a filesystem context, releasing any resources it holds. This
+ calls the ->free() operation. This is intended to be called by anyone who
+ created a filesystem context.
+
+ [!] filesystem contexts are not refcounted, so this causes unconditional
+ destruction.
+
+In all the above operations, apart from the put op, the return is a mount
+context pointer or a negative error code.
+
+For the remaining operations, if an error occurs, a negative error code will be
+returned.
+
+ (*) int vfs_get_tree(struct fs_context *fc);
+
+ Get or create the mountable root and superblock, using the parameters in
+ the filesystem context to select/configure the superblock. This invokes
+ the ->validate() op and then the ->get_tree() op.
+
+ [NOTE] ->validate() could perhaps be rolled into ->get_tree() and
+ ->reconfigure().
+
+ (*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
+
+ Create a mount given the parameters in the specified filesystem context.
+ Note that this does not attach the mount to anything.
+
+ (*) int vfs_parse_fs_param(struct fs_context *fc,
+ struct fs_parameter *param);
+
+ Supply a single mount parameter to the filesystem context. This include
+ the specification of the source/device which is specified as the "source"
+ parameter (which may be specified multiple times if the filesystem
+ supports that).
+
+ param specifies the parameter key name and the value. The parameter is
+ first checked to see if it corresponds to a standard mount flag (in which
+ case it is used to set an SB_xxx flag and consumed) or a security option
+ (in which case the LSM consumes it) before it is passed on to the
+ filesystem.
+
+ The parameter value is typed and can be one of:
+
+ fs_value_is_flag, Parameter not given a value.
+ fs_value_is_string, Value is a string
+ fs_value_is_blob, Value is a binary blob
+ fs_value_is_filename, Value is a filename* + dirfd
+ fs_value_is_filename_empty, Value is a filename* + dirfd + AT_EMPTY_PATH
+ fs_value_is_file, Value is an open file (file*)
+
+ If there is a value, that value is stored in a union in the struct in one
+ of param->{string,blob,name,file}. Note that the function may steal and
+ clear the pointer, but then becomes responsible for disposing of the
+ object.
+
+ (*) int vfs_parse_fs_string(struct fs_context *fc, char *key,
+ const char *value, size_t v_size);
+
+ A wrapper around vfs_parse_fs_param() that just passes a constant string.
+
+ (*) int generic_parse_monolithic(struct fs_context *fc, void *data);
+
+ Parse a sys_mount() data page, assuming the form to be a text list
+ consisting of key[=val] options separated by commas. Each item in the
+ list is passed to vfs_mount_option(). This is the default when the
+ ->parse_monolithic() operation is NULL.
+
+
+=====================
+PARAMETER DESCRIPTION
+=====================
+
+Parameters are described using structures defined in linux/fs_parser.h.
+There's a core description struct that links everything together:
+
+ struct fs_parameter_description {
+ const char name[16];
+ u8 nr_params;
+ u8 nr_alt_keys;
+ u8 nr_enums;
+ bool ignore_unknown;
+ bool no_source;
+ const char *const *keys;
+ const struct constant_table *alt_keys;
+ const struct fs_parameter_spec *specs;
+ const struct fs_parameter_enum *enums;
+ };
+
+For example:
+
+ enum afs_param {
+ Opt_autocell,
+ Opt_bar,
+ Opt_dyn,
+ Opt_foo,
+ Opt_source,
+ nr__afs_params
+ };
+
+ static const struct fs_parameter_description afs_fs_parameters = {
+ .name = "kAFS",
+ .nr_params = nr__afs_params,
+ .nr_alt_keys = ARRAY_SIZE(afs_param_alt_keys),
+ .nr_enums = ARRAY_SIZE(afs_param_enums),
+ .keys = afs_param_keys,
+ .alt_keys = afs_param_alt_keys,
+ .specs = afs_param_specs,
+ .enums = afs_param_enums,
+ };
+
+The members are as follows:
+
+ (1) const char name[16];
+
+ The name to be used in error messages generated by the parse helper
+ functions.
+
+ (2) u8 nr_params;
+
+ The number of discrete parameter identifiers. This indicates the number
+ of elements in the ->types[] array and also limits the values that may be
+ used in the values that the ->keys[] array maps to.
+
+ It is expected that, for example, two parameters that are related, say
+ "acl" and "noacl" with have the same ID, but will be flagged to indicate
+ that one is the inverse of the other. The value can then be picked out
+ from the parse result.
+
+ (3) const struct fs_parameter_specification *specs;
+
+ Table of parameter specifications, where the entries are of type:
+
+ struct fs_parameter_type {
+ enum fs_parameter_spec type:8;
+ u8 flags;
+ };
+
+ and the parameter identifier is the index to the array. 'type' indicates
+ the desired value type and must be one of:
+
+ TYPE NAME EXPECTED VALUE RESULT IN
+ ======================= ======================= =====================
+ fs_param_is_flag No value n/a
+ fs_param_is_bool Boolean value result->boolean
+ fs_param_is_u32 32-bit unsigned int result->uint_32
+ fs_param_is_u32_octal 32-bit octal int result->uint_32
+ fs_param_is_u32_hex 32-bit hex int result->uint_32
+ fs_param_is_s32 32-bit signed int result->int_32
+ fs_param_is_enum Enum value name result->uint_32
+ fs_param_is_string Arbitrary string param->string
+ fs_param_is_blob Binary blob param->blob
+ fs_param_is_blockdev Blockdev path * Needs lookup
+ fs_param_is_path Path * Needs lookup
+ fs_param_is_fd File descriptor param->file
+
+ And each parameter can be qualified with 'flags':
+
+ fs_param_v_optional The value is optional
+ fs_param_neg_with_no If key name is prefixed with "no", it is false
+ fs_param_neg_with_empty If value is "", it is false
+ fs_param_deprecated The parameter is deprecated.
+
+ For example:
+
+ static const struct fs_parameter_spec afs_param_specs[nr__afs_params] = {
+ [Opt_autocell] = { fs_param_is flag },
+ [Opt_bar] = { fs_param_is_enum },
+ [Opt_dyn] = { fs_param_is flag },
+ [Opt_foo] = { fs_param_is_bool, fs_param_neg_with_no },
+ [Opt_source] = { fs_param_is_string },
+ };
+
+ Note that if the value is of fs_param_is_bool type, fs_parse() will try
+ to match any string value against "0", "1", "no", "yes", "false", "true".
+
+ [!] NOTE that the table must be sorted according to primary key name so
+ that ->keys[] is also sorted.
+
+ (4) const char *const *keys;
+
+ Table of primary key names for the parameters. There must be one entry
+ per defined parameter. The table is optional if ->nr_params is 0. The
+ table is just an array of names e.g.:
+
+ static const char *const afs_param_keys[nr__afs_params] = {
+ [Opt_autocell] = "autocell",
+ [Opt_bar] = "bar",
+ [Opt_dyn] = "dyn",
+ [Opt_foo] = "foo",
+ [Opt_source] = "source",
+ };
+
+ [!] NOTE that the table must be sorted such that the table can be searched
+ with bsearch() using strcmp(). This means that the Opt_* values must
+ correspond to the entries in this table.
+
+ (5) const struct constant_table *alt_keys;
+ u8 nr_alt_keys;
+
+ Table of additional key names and their mappings to parameter ID plus the
+ number of elements in the table. This is optional. The table is just an
+ array of { name, integer } pairs, e.g.:
+
+ static const struct constant_table afs_param_keys[] = {
+ { "baz", Opt_bar },
+ { "dynamic", Opt_dyn },
+ };
+
+ [!] NOTE that the table must be sorted such that strcmp() can be used with
+ bsearch() to search the entries.
+
+ The parameter ID can also be fs_param_key_removed to indicate that a
+ deprecated parameter has been removed and that an error will be given.
+ This differs from fs_param_deprecated where the parameter may still have
+ an effect.
+
+ Further, the behaviour of the parameter may differ when an alternate name
+ is used (for instance with NFS, "v3", "v4.2", etc. are alternate names).
+
+ (6) const struct fs_parameter_enum *enums;
+ u8 nr_enums;
+
+ Table of enum value names to integer mappings and the number of elements
+ stored therein. This is of type:
+
+ struct fs_parameter_enum {
+ u8 param_id;
+ char name[14];
+ u8 value;
+ };
+
+ Where the array is an unsorted list of { parameter ID, name }-keyed
+ elements that indicate the value to map to, e.g.:
+
+ static const struct fs_parameter_enum afs_param_enums[] = {
+ { Opt_bar, "x", 1},
+ { Opt_bar, "y", 23},
+ { Opt_bar, "z", 42},
+ };
+
+ If a parameter of type fs_param_is_enum is encountered, fs_parse() will
+ try to look the value up in the enum table and the result will be stored
+ in the parse result.
+
+ (7) bool no_source;
+
+ If this is set, fs_parse() will ignore any "source" parameter and not
+ pass it to the filesystem.
+
+The parser should be pointed to by the parser pointer in the file_system_type
+struct as this will provide validation on registration (if
+CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from
+userspace using the fsinfo() syscall.
+
+
+==========================
+PARAMETER HELPER FUNCTIONS
+==========================
+
+A number of helper functions are provided to help a filesystem or an LSM
+process the parameters it is given.
+
+ (*) int lookup_constant(const struct constant_table tbl[],
+ const char *name, int not_found);
+
+ Look up a constant by name in a table of name -> integer mappings. The
+ table is an array of elements of the following type:
+
+ struct constant_table {
+ const char *name;
+ int value;
+ };
+
+ and it must be sorted such that it can be searched using bsearch() using
+ strcmp(). If a match is found, the corresponding value is returned. If a
+ match isn't found, the not_found value is returned instead.
+
+ (*) bool validate_constant_table(const struct constant_table *tbl,
+ size_t tbl_size,
+ int low, int high, int special);
+
+ Validate a constant table. Checks that all the elements are appropriately
+ ordered, that there are no duplicates and that the values are between low
+ and high inclusive, though provision is made for one allowable special
+ value outside of that range. If no special value is required, special
+ should just be set to lie inside the low-to-high range.
+
+ If all is good, true is returned. If the table is invalid, errors are
+ logged to dmesg, the stack is dumped and false is returned.
+
+ (*) int fs_parse(struct fs_context *fc,
+ const struct fs_param_parser *parser,
+ struct fs_parameter *param,
+ struct fs_param_parse_result *result);
+
+ This is the main interpreter of parameters. It uses the parameter
+ description (parser) to look up the name of the parameter to use and to
+ convert that to a parameter ID (stored in result->key).
+
+ If successful, and if the parameter type indicates the result is a
+ boolean, integer or enum type, the value is converted by this function and
+ the result stored in result->{boolean,int_32,uint_32}.
+
+ If a match isn't initially made, the key is prefixed with "no" and no
+ value is present then an attempt will be made to look up the key with the
+ prefix removed. If this matches a parameter for which the type has flag
+ fs_param_neg_with_no set, then a match will be made and the value will be
+ set to false/0/NULL.
+
+ If the parameter is successfully matched and, optionally, parsed
+ correctly, 1 is returned. If the parameter isn't matched and
+ parser->ignore_unknown is set, then 0 is returned. Otherwise -EINVAL is
+ returned.
+
+ (*) bool fs_validate_description(const struct fs_parameter_description *desc);
+
+ This is validates the parameter description. It returns true if the
+ description is good and false if it is not.
+
+ (*) int fs_lookup_param(struct fs_context *fc,
+ struct fs_parameter *value,
+ bool want_bdev,
+ struct path *_path);
+
+ This takes a parameter that carries a string or filename type and attempts
+ to do a path lookup on it. If the parameter expects a blockdev, a check
+ is made that the inode actually represents one.
+
+ Returns 0 if successful and *_path will be set; returns a negative error
+ code if not.
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 9d6b68853f5b..434a07b0002b 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1,3 +1,18 @@
+===============
+Pathname lookup
+===============
+
+This write-up is based on three articles published at lwn.net:
+
+- <https://lwn.net/Articles/649115/> Pathname lookup in Linux
+- <https://lwn.net/Articles/649729/> RCU-walk: faster pathname lookup in Linux
+- <https://lwn.net/Articles/650786/> A walk among the symlinks
+
+Written by Neil Brown with help from Al Viro and Jon Corbet.
+It has subsequently been updated to reflect changes in the kernel
+including:
+
+- per-directory parallel name lookup.
Introduction to pathname lookup
===============================
@@ -344,7 +359,7 @@ In particular it is held while scanning chains in the dcache hash
table, and the mount point hash table.
Bringing it together with ``struct nameidata``
---------------------------------------------
+----------------------------------------------
.. _First edition Unix: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/u2.s
@@ -355,7 +370,7 @@ converts a "name" to an "inode". ``struct nameidata`` contains (among
other fields):
``struct path path``
-~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
A ``path`` contains a ``struct vfsmount`` (which is
embedded in a ``struct mount``) and a ``struct dentry``. Together these
@@ -366,13 +381,13 @@ step. A reference through ``d_lockref`` and ``mnt_count`` is always
held.
``struct qstr last``
-~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
This is a string together with a length (i.e. _not_ ``nul`` terminated)
that is the "next" component in the pathname.
``int last_type``
-~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~
This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or
``LAST_BIND``. The ``last`` field is only valid if the type is
@@ -381,7 +396,7 @@ components of the symlink have been processed yet. Others should be
fairly self-explanatory.
``struct path root``
-~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
This is used to hold a reference to the effective root of the
filesystem. Often that reference won't be needed, so this field is
@@ -510,7 +525,7 @@ potentially interesting things about these dentries corresponding
to three different flags that might be set in ``dentry->d_flags``:
``DCACHE_MANAGE_TRANSIT``
-~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~
If this flag has been set, then the filesystem has requested that the
``d_manage()`` dentry operation be called before handling any possible
@@ -529,7 +544,7 @@ filesystem, which will then give it a special pass through
``d_manage()`` by returning ``-EISDIR``.
``DCACHE_MOUNTED``
-~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~
This flag is set on every dentry that is mounted on. As Linux
supports multiple filesystem namespaces, it is possible that the
@@ -542,7 +557,7 @@ If this flag is set, and ``d_manage()`` didn't return ``-EISDIR``,
and a new ``dentry`` (both with counted references).
``DCACHE_NEED_AUTOMOUNT``
-~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~
If ``d_manage()`` allowed us to get this far, and ``lookup_mnt()`` didn't
find a mount point, then this flag causes the ``d_automount()`` dentry
@@ -698,7 +713,7 @@ With that little refresher on seqlocks out of the way we can look at
the bigger picture of how RCU-walk uses seqlocks.
``mount_lock`` and ``nd->m_seq``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We already met the ``mount_lock`` seqlock when REF-walk used it to
ensure that crossing a mount point is performed safely. RCU-walk uses
@@ -727,7 +742,7 @@ results would have been the same. This ensures the invariant holds,
at least for vfsmount structures.
``dentry->d_seq`` and ``nd->seq``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In place of taking a count or lock on ``d_reflock``, RCU-walk samples
the per-dentry ``d_seq`` seqlock, and stores the sequence number in the
@@ -774,7 +789,7 @@ getting a counted reference to the new dentry before dropping that for
the old dentry which we saw in REF-walk.
No ``inode->i_rwsem`` or even ``rename_lock``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A semaphore is a fairly heavyweight lock that can only be taken when it is
permissible to sleep. As ``rcu_read_lock()`` forbids sleeping,
@@ -796,7 +811,7 @@ locking. This neatly handles all cases, so adding extra checks on
rename_lock would bring no significant value.
``unlazy walk()`` and ``complete_walk()``
--------------------------------------
+-----------------------------------------
That "dropping down to REF-walk" typically involves a call to
``unlazy_walk()``, so named because "RCU-walk" is also sometimes
diff --git a/Documentation/filesystems/splice.rst b/Documentation/filesystems/splice.rst
new file mode 100644
index 000000000000..edd874808472
--- /dev/null
+++ b/Documentation/filesystems/splice.rst
@@ -0,0 +1,22 @@
+================
+splice and pipes
+================
+
+splice API
+==========
+
+splice is a method for moving blocks of data around inside the kernel,
+without continually transferring them between the kernel and user space.
+
+.. kernel-doc:: fs/splice.c
+
+pipes API
+=========
+
+Pipe interfaces are all for in-kernel (builtin image) use. They are not
+exported for use by modules.
+
+.. kernel-doc:: include/linux/pipe_fs_i.h
+ :internal:
+
+.. kernel-doc:: fs/pipe.c
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index a1426cabcef1..5b5311f9358d 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -116,6 +116,27 @@ static struct device_attribute dev_attr_foo = {
.store = store_foo,
};
+Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally
+considered a bad idea." so trying to set a sysfs file writable for
+everyone will fail reverting to RO mode for "Others".
+
+For the common cases sysfs.h provides convenience macros to make
+defining attributes easier as well as making code more concise and
+readable. The above case could be shortened to:
+
+static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
+
+the list of helpers available to define your wrapper function is:
+__ATTR_RO(name): assumes default name_show and mode 0444
+__ATTR_WO(name): assumes a name_store only and is restricted to mode
+ 0200 that is root write access only.
+__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently
+ only use case is the EFI System Resource Table
+ (see drivers/firmware/efi/esrt.c)
+__ATTR_RW(name): assumes default name_show, name_store and setting
+ mode to 0644.
+__ATTR_NULL: which sets the name to NULL and is used as end of list
+ indicator (see: kernel/workqueue.c)
Subsystem-Specific Callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -344,7 +365,9 @@ struct bus_attribute {
Declaring:
-BUS_ATTR(_name, _mode, _show, _store)
+static BUS_ATTR_RW(name);
+static BUS_ATTR_RO(name);
+static BUS_ATTR_WO(name);
Creation/Removal:
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 8dc8e9c2913f..761c6fd24a53 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,7 @@ struct file_operations {
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
+ int (*iopoll)(struct kiocb *kiocb, bool spin);
int (*iterate) (struct file *, struct dir_context *);
int (*iterate_shared) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
@@ -902,6 +903,8 @@ otherwise noted.
write_iter: possibly asynchronous write with iov_iter as source
+ iopoll: called when aio wants to poll for completions on HIPRI iocbs
+
iterate: called when the VFS needs to read the directory contents
iterate_shared: called when the VFS needs to read the directory contents
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 9ccfd1bc6201..a5cbb5e0e3db 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -272,7 +272,7 @@ The following sysctls are available for the XFS filesystem:
XFS_ERRLEVEL_LOW: 1
XFS_ERRLEVEL_HIGH: 5
- fs.xfs.panic_mask (Min: 0 Default: 0 Max: 255)
+ fs.xfs.panic_mask (Min: 0 Default: 0 Max: 256)
Causes certain error conditions to call BUG(). Value is a bitmask;
OR together the tags which represent errors which should cause panics:
@@ -285,6 +285,7 @@ The following sysctls are available for the XFS filesystem:
XFS_PTAG_SHUTDOWN_IOERROR 0x00000020
XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040
XFS_PTAG_FSBLOCK_ZERO 0x00000080
+ XFS_PTAG_VERIFIER_ERROR 0x00000100
This option is intended for debugging only.