aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems/ext4
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems/ext4')
-rw-r--r--Documentation/filesystems/ext4/bigalloc.rst32
-rw-r--r--Documentation/filesystems/ext4/blockgroup.rst10
-rw-r--r--Documentation/filesystems/ext4/blocks.rst4
-rw-r--r--Documentation/filesystems/ext4/directory.rst2
-rw-r--r--Documentation/filesystems/ext4/group_descr.rst9
-rw-r--r--Documentation/filesystems/ext4/index.rst8
-rw-r--r--Documentation/filesystems/ext4/inodes.rst10
-rw-r--r--Documentation/filesystems/ext4/overview.rst1
-rw-r--r--Documentation/filesystems/ext4/super.rst22
-rw-r--r--Documentation/filesystems/ext4/verity.rst41
10 files changed, 105 insertions, 34 deletions
diff --git a/Documentation/filesystems/ext4/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst
index c6d88557553c..72075aa608e4 100644
--- a/Documentation/filesystems/ext4/bigalloc.rst
+++ b/Documentation/filesystems/ext4/bigalloc.rst
@@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size
exceeds the page size. However, for a filesystem of mostly huge files,
it is desirable to be able to allocate disk blocks in units of multiple
blocks to reduce both fragmentation and metadata overhead. The
-`bigalloc <Bigalloc>`__ feature provides exactly this ability. The
-administrator can set a block cluster size at mkfs time (which is stored
-in the s\_log\_cluster\_size field in the superblock); from then on, the
-block bitmaps track clusters, not individual blocks. This means that
-block groups can be several gigabytes in size (instead of just 128MiB);
-however, the minimum allocation unit becomes a cluster, not a block,
-even for directories. TaoBao had a patchset to extend the “use units of
-clusters instead of blocks” to the extent tree, though it is not clear
-where those patches went-- they eventually morphed into “extent tree v2”
-but that code has not landed as of May 2015.
+bigalloc feature provides exactly this ability.
+
+The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
+use clustered allocation, so that each bit in the ext4 block allocation
+bitmap addresses a power of two number of blocks. For example, if the
+file system is mainly going to be storing large files in the 4-32
+megabyte range, it might make sense to set a cluster size of 1 megabyte.
+This means that each bit in the block allocation bitmap now addresses
+256 4k blocks. This shrinks the total size of the block allocation
+bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
+means that a block group addresses 32 gigabytes instead of 128 megabytes,
+also shrinking the amount of file system overhead for metadata.
+
+The administrator can set a block cluster size at mkfs time (which is
+stored in the s\_log\_cluster\_size field in the superblock); from then
+on, the block bitmaps track clusters, not individual blocks. This means
+that block groups can be several gigabytes in size (instead of just
+128MiB); however, the minimum allocation unit becomes a cluster, not a
+block, even for directories. TaoBao had a patchset to extend the “use
+units of clusters instead of blocks” to the extent tree, though it is
+not clear where those patches went-- they eventually morphed into
+“extent tree v2” but that code has not landed as of May 2015.
diff --git a/Documentation/filesystems/ext4/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst
index baf888e4c06a..3da156633339 100644
--- a/Documentation/filesystems/ext4/blockgroup.rst
+++ b/Documentation/filesystems/ext4/blockgroup.rst
@@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the
-block metadata close together for faster loading, and to enable large
-files to be continuous on disk. Backup copies of the superblock and
-group descriptors are always at the beginning of block groups, even if
-flex\_bg is enabled. The number of block groups that make up a flex\_bg
-is given by 2 ^ ``sb.s_log_groups_per_flex``.
+block group metadata close together for faster loading, and to enable
+large files to be continuous on disk. Backup copies of the superblock
+and group descriptors are always at the beginning of block groups, even
+if flex\_bg is enabled. The number of block groups that make up a
+flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
Meta Block Groups
-----------------
diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst
index 73d4dc0f7bda..bd722ecd92d6 100644
--- a/Documentation/filesystems/ext4/blocks.rst
+++ b/Documentation/filesystems/ext4/blocks.rst
@@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is
4KiB. You may experience mounting problems if block size is greater than
page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory
pages). By default a filesystem can contain 2^32 blocks; if the '64bit'
-feature is enabled, then a filesystem can have 2^64 blocks.
+feature is enabled, then a filesystem can have 2^64 blocks. The location
+of structures is stored in terms of the block number the structure lives
+in and not the absolute offset on disk.
For 32-bit filesystems, limits are as follows:
diff --git a/Documentation/filesystems/ext4/directory.rst b/Documentation/filesystems/ext4/directory.rst
index 614034e24669..073940cc64ed 100644
--- a/Documentation/filesystems/ext4/directory.rst
+++ b/Documentation/filesystems/ext4/directory.rst
@@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference
- File name.
Since file names cannot be longer than 255 bytes, the new directory
-entry format shortens the rec\_len field and uses the space for a file
+entry format shortens the name\_len field and uses the space for a file
type flag, probably to avoid having to load every inode during directory
tree traversal. This format is ``ext4_dir_entry_2``, which is at most
263 bytes long, though on disk you'll need to reference
diff --git a/Documentation/filesystems/ext4/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst
index 0f783ed88592..7ba6114e7f5c 100644
--- a/Documentation/filesystems/ext4/group_descr.rst
+++ b/Documentation/filesystems/ext4/group_descr.rst
@@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
* - 0x1E
- \_\_le16
- bg\_checksum
- - Group descriptor checksum; crc16(sb\_uuid+group+desc) if the
- RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) &
- 0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set.
+ - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
+ RO\_COMPAT\_GDT\_CSUM feature is set, or
+ crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
+ RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum
+ field in bg\_desc is skipped when calculating crc16 checksum,
+ and set to zero if crc32c checksum is used.
* -
-
-
diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst
index 3be3e54d480d..705d813d558f 100644
--- a/Documentation/filesystems/ext4/index.rst
+++ b/Documentation/filesystems/ext4/index.rst
@@ -8,7 +8,7 @@ ext4 Data Structures and Algorithms
:maxdepth: 6
:numbered:
- about.rst
- overview.rst
- globals.rst
- dynamic.rst
+ about
+ overview
+ globals
+ dynamic
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index 6bd35e506b6f..a65baffb4ebf 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -277,6 +277,8 @@ The ``i_flags`` field is a combination of these values:
- This is a huge file (EXT4\_HUGE\_FILE\_FL).
* - 0x80000
- Inode uses extents (EXT4\_EXTENTS\_FL).
+ * - 0x100000
+ - Verity protected file (EXT4\_VERITY\_FL).
* - 0x200000
- Inode stores a large extended attribute value in its data blocks
(EXT4\_EA\_INODE\_FL).
@@ -299,9 +301,9 @@ The ``i_flags`` field is a combination of these values:
- Reserved for ext4 library (EXT4\_RESERVED\_FL).
* -
- Aggregate flags:
- * - 0x4BDFFF
+ * - 0x705BDFFF
- User-visible flags.
- * - 0x4B80FF
+ * - 0x604BC0FF
- User-modifiable flags. Note that while EXT4\_JOURNAL\_DATA\_FL and
EXT4\_EXTENTS\_FL can be set with setattr, they are not in the kernel's
EXT4\_FL\_USER\_MODIFIABLE mask, since it needs to handle the setting of
@@ -470,8 +472,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without
having to upgrade all of the on-disk inodes. Access to fields beyond
EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
-of October 2013) the inode structure is 156 bytes
-(``i_extra_isize = 28``). The extra space between the end of the inode
+of August 2019) the inode structure is 160 bytes
+(``i_extra_isize = 32``). The extra space between the end of the inode
structure and the end of the inode record can be used to store extended
attributes. Each inode record can be as large as the filesystem block
size, though this is not terribly efficient.
diff --git a/Documentation/filesystems/ext4/overview.rst b/Documentation/filesystems/ext4/overview.rst
index cbab18baba12..123ebfde47ee 100644
--- a/Documentation/filesystems/ext4/overview.rst
+++ b/Documentation/filesystems/ext4/overview.rst
@@ -24,3 +24,4 @@ order.
.. include:: bigalloc.rst
.. include:: inlinedata.rst
.. include:: eainode.rst
+.. include:: verity.rst
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 04ff079a2acf..93e55d7c1d40 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in
* - 0x1C
- \_\_le32
- s\_log\_cluster\_size
- - Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is
+ - Cluster size is 2 ^ (10 + s\_log\_cluster\_size) blocks if bigalloc is
enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size.
* - 0x20
- \_\_le32
@@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in
- Upper 8 bits of the s_wtime field.
* - 0x275
- \_\_u8
- - s\_wtime_hi
+ - s\_mtime_hi
- Upper 8 bits of the s_mtime field.
* - 0x276
- \_\_u8
@@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in
- s\_last_error_time_hi
- Upper 8 bits of the s_last_error_time_hi field.
* - 0x27A
- - \_\_u8[2]
- - s\_pad
+ - \_\_u8
+ - s\_pad[2]
- Zero padding.
* - 0x27C
+ - \_\_le16
+ - s\_encoding
+ - Filename charset encoding.
+ * - 0x27E
+ - \_\_le16
+ - s\_encoding_flags
+ - Filename charset encoding flags.
+ * - 0x280
- \_\_le32
- - s\_reserved[96]
+ - s\_reserved[95]
- Padding to the end of the block.
* - 0x3FC
- \_\_le32
@@ -617,7 +625,7 @@ following:
* - 0x80
- Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT).
* - 0x100
- - Multiple mount protection. Not implemented (INCOMPAT\_MMP).
+ - Multiple mount protection (INCOMPAT\_MMP).
* - 0x200
- Flexible block groups. See the earlier discussion of this feature
(INCOMPAT\_FLEX\_BG).
@@ -696,6 +704,8 @@ the following:
(RO\_COMPAT\_READONLY)
* - 0x2000
- Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
+ * - 0x8000
+ - Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
.. _super_def_hash:
diff --git a/Documentation/filesystems/ext4/verity.rst b/Documentation/filesystems/ext4/verity.rst
new file mode 100644
index 000000000000..3e4c0ee0e068
--- /dev/null
+++ b/Documentation/filesystems/ext4/verity.rst
@@ -0,0 +1,41 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Verity files
+------------
+
+ext4 supports fs-verity, which is a filesystem feature that provides
+Merkle tree based hashing for individual readonly files. Most of
+fs-verity is common to all filesystems that support it; see
+:ref:`Documentation/filesystems/fsverity.rst <fsverity>` for the
+fs-verity documentation. However, the on-disk layout of the verity
+metadata is filesystem-specific. On ext4, the verity metadata is
+stored after the end of the file data itself, in the following format:
+
+- Zero-padding to the next 65536-byte boundary. This padding need not
+ actually be allocated on-disk, i.e. it may be a hole.
+
+- The Merkle tree, as documented in
+ :ref:`Documentation/filesystems/fsverity.rst
+ <fsverity_merkle_tree>`, with the tree levels stored in order from
+ root to leaf, and the tree blocks within each level stored in their
+ natural order.
+
+- Zero-padding to the next filesystem block boundary.
+
+- The verity descriptor, as documented in
+ :ref:`Documentation/filesystems/fsverity.rst <fsverity_descriptor>`,
+ with optionally appended signature blob.
+
+- Zero-padding to the next offset that is 4 bytes before a filesystem
+ block boundary.
+
+- The size of the verity descriptor in bytes, as a 4-byte little
+ endian integer.
+
+Verity inodes have EXT4_VERITY_FL set, and they must use extents, i.e.
+EXT4_EXTENTS_FL must be set and EXT4_INLINE_DATA_FL must be clear.
+They can have EXT4_ENCRYPT_FL set, in which case the verity metadata
+is encrypted as well as the data itself.
+
+Verity files cannot have blocks allocated past the end of the verity
+metadata.