diff options
Diffstat (limited to 'Documentation/filesystems/ext4/ondisk/directory.rst')
-rw-r--r-- | Documentation/filesystems/ext4/ondisk/directory.rst | 426 |
1 files changed, 0 insertions, 426 deletions
diff --git a/Documentation/filesystems/ext4/ondisk/directory.rst b/Documentation/filesystems/ext4/ondisk/directory.rst deleted file mode 100644 index 8fcba68c2884..000000000000 --- a/Documentation/filesystems/ext4/ondisk/directory.rst +++ /dev/null @@ -1,426 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -Directory Entries ------------------ - -In an ext4 filesystem, a directory is more or less a flat file that maps -an arbitrary byte string (usually ASCII) to an inode number on the -filesystem. There can be many directory entries across the filesystem -that reference the same inode number--these are known as hard links, and -that is why hard links cannot reference files on other filesystems. As -such, directory entries are found by reading the data block(s) -associated with a directory file for the particular directory entry that -is desired. - -Linear (Classic) Directories -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By default, each directory lists its entries in an “almost-linear” -array. I write “almost” because it's not a linear array in the memory -sense because directory entries are not split across filesystem blocks. -Therefore, it is more accurate to say that a directory is a series of -data blocks and that each block contains a linear array of directory -entries. The end of each per-block array is signified by reaching the -end of the block; the last entry in the block has a record length that -takes it all the way to the end of the block. The end of the entire -directory is of course signified by reaching the end of the file. Unused -directory entries are signified by inode = 0. By default the filesystem -uses ``struct ext4_dir_entry_2`` for directory entries unless the -“filetype” feature flag is not set, in which case it uses -``struct ext4_dir_entry``. - -The original directory entry format is ``struct ext4_dir_entry``, which -is at most 263 bytes long, though on disk you'll need to reference -``dirent.rec_len`` to know for sure. - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Size - - Name - - Description - * - 0x0 - - \_\_le32 - - inode - - Number of the inode that this directory entry points to. - * - 0x4 - - \_\_le16 - - rec\_len - - Length of this directory entry. Must be a multiple of 4. - * - 0x6 - - \_\_le16 - - name\_len - - Length of the file name. - * - 0x8 - - char - - name[EXT4\_NAME\_LEN] - - File name. - -Since file names cannot be longer than 255 bytes, the new directory -entry format shortens the rec\_len field and uses the space for a file -type flag, probably to avoid having to load every inode during directory -tree traversal. This format is ``ext4_dir_entry_2``, which is at most -263 bytes long, though on disk you'll need to reference -``dirent.rec_len`` to know for sure. - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Size - - Name - - Description - * - 0x0 - - \_\_le32 - - inode - - Number of the inode that this directory entry points to. - * - 0x4 - - \_\_le16 - - rec\_len - - Length of this directory entry. - * - 0x6 - - \_\_u8 - - name\_len - - Length of the file name. - * - 0x7 - - \_\_u8 - - file\_type - - File type code, see ftype_ table below. - * - 0x8 - - char - - name[EXT4\_NAME\_LEN] - - File name. - -.. _ftype: - -The directory file type is one of the following values: - -.. list-table:: - :widths: 1 79 - :header-rows: 1 - - * - Value - - Description - * - 0x0 - - Unknown. - * - 0x1 - - Regular file. - * - 0x2 - - Directory. - * - 0x3 - - Character device file. - * - 0x4 - - Block device file. - * - 0x5 - - FIFO. - * - 0x6 - - Socket. - * - 0x7 - - Symbolic link. - -In order to add checksums to these classic directory blocks, a phony -``struct ext4_dir_entry`` is placed at the end of each leaf block to -hold the checksum. The directory entry is 12 bytes long. The inode -number and name\_len fields are set to zero to fool old software into -ignoring an apparently empty directory entry, and the checksum is stored -in the place where the name normally goes. The structure is -``struct ext4_dir_entry_tail``: - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Size - - Name - - Description - * - 0x0 - - \_\_le32 - - det\_reserved\_zero1 - - Inode number, which must be zero. - * - 0x4 - - \_\_le16 - - det\_rec\_len - - Length of this directory entry, which must be 12. - * - 0x6 - - \_\_u8 - - det\_reserved\_zero2 - - Length of the file name, which must be zero. - * - 0x7 - - \_\_u8 - - det\_reserved\_ft - - File type, which must be 0xDE. - * - 0x8 - - \_\_le32 - - det\_checksum - - Directory leaf block checksum. - -The leaf directory block checksum is calculated against the FS UUID, the -directory's inode number, the directory's inode generation number, and -the entire directory entry block up to (but not including) the fake -directory entry. - -Hash Tree Directories -~~~~~~~~~~~~~~~~~~~~~ - -A linear array of directory entries isn't great for performance, so a -new feature was added to ext3 to provide a faster (but peculiar) -balanced tree keyed off a hash of the directory entry name. If the -EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a -hashed btree (htree) to organize and find directory entries. For -backwards read-only compatibility with ext2, this tree is actually -hidden inside the directory file, masquerading as “empty” directory data -blocks! It was stated previously that the end of the linear directory -entry table was signified with an entry pointing to inode 0; this is -(ab)used to fool the old linear-scan algorithm into thinking that the -rest of the directory block is empty so that it moves on. - -The root of the tree always lives in the first data block of the -directory. By ext2 custom, the '.' and '..' entries must appear at the -beginning of this first block, so they are put here as two -``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of -the root node contains metadata about the tree and finally a hash->block -map to find nodes that are lower in the htree. If -``dx_root.info.indirect_levels`` is non-zero then the htree has two -levels; the data block pointed to by the root node's map is an interior -node, which is indexed by a minor hash. Interior nodes in this tree -contains a zeroed out ``struct ext4_dir_entry_2`` followed by a -minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear -array of all ``struct ext4_dir_entry_2``; all of these entries -(presumably) hash to the same value. If there is an overflow, the -entries simply overflow into the next leaf node, and the -least-significant bit of the hash (in the interior node map) that gets -us to this next leaf node is set. - -To traverse the directory as a htree, the code calculates the hash of -the desired file name and uses it to find the corresponding block -number. If the tree is flat, the block is a linear array of directory -entries that can be searched; otherwise, the minor hash of the file name -is computed and used against this second block to find the corresponding -third block number. That third block number will be a linear array of -directory entries. - -To traverse the directory as a linear array (such as the old code does), -the code simply reads every data block in the directory. The blocks used -for the htree will appear to have no entries (aside from '.' and '..') -and so only the leaf nodes will appear to have any interesting content. - -The root of the htree is in ``struct dx_root``, which is the full length -of a data block: - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Type - - Name - - Description - * - 0x0 - - \_\_le32 - - dot.inode - - inode number of this directory. - * - 0x4 - - \_\_le16 - - dot.rec\_len - - Length of this record, 12. - * - 0x6 - - u8 - - dot.name\_len - - Length of the name, 1. - * - 0x7 - - u8 - - dot.file\_type - - File type of this entry, 0x2 (directory) (if the feature flag is set). - * - 0x8 - - char - - dot.name[4] - - “.\\0\\0\\0” - * - 0xC - - \_\_le32 - - dotdot.inode - - inode number of parent directory. - * - 0x10 - - \_\_le16 - - dotdot.rec\_len - - block\_size - 12. The record length is long enough to cover all htree - data. - * - 0x12 - - u8 - - dotdot.name\_len - - Length of the name, 2. - * - 0x13 - - u8 - - dotdot.file\_type - - File type of this entry, 0x2 (directory) (if the feature flag is set). - * - 0x14 - - char - - dotdot\_name[4] - - “..\\0\\0” - * - 0x18 - - \_\_le32 - - struct dx\_root\_info.reserved\_zero - - Zero. - * - 0x1C - - u8 - - struct dx\_root\_info.hash\_version - - Hash type, see dirhash_ table below. - * - 0x1D - - u8 - - struct dx\_root\_info.info\_length - - Length of the tree information, 0x8. - * - 0x1E - - u8 - - struct dx\_root\_info.indirect\_levels - - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR - feature is set; cannot be larger than 2 otherwise. - * - 0x1F - - u8 - - struct dx\_root\_info.unused\_flags - - - * - 0x20 - - \_\_le16 - - limit - - Maximum number of dx\_entries that can follow this header, plus 1 for - the header itself. - * - 0x22 - - \_\_le16 - - count - - Actual number of dx\_entries that follow this header, plus 1 for the - header itself. - * - 0x24 - - \_\_le32 - - block - - The block number (within the directory file) that goes with hash=0. - * - 0x28 - - struct dx\_entry - - entries[0] - - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. - -.. _dirhash: - -The directory hash is one of the following values: - -.. list-table:: - :widths: 1 79 - :header-rows: 1 - - * - Value - - Description - * - 0x0 - - Legacy. - * - 0x1 - - Half MD4. - * - 0x2 - - Tea. - * - 0x3 - - Legacy, unsigned. - * - 0x4 - - Half MD4, unsigned. - * - 0x5 - - Tea, unsigned. - -Interior nodes of an htree are recorded as ``struct dx_node``, which is -also the full length of a data block: - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Type - - Name - - Description - * - 0x0 - - \_\_le32 - - fake.inode - - Zero, to make it look like this entry is not in use. - * - 0x4 - - \_\_le16 - - fake.rec\_len - - The size of the block, in order to hide all of the dx\_node data. - * - 0x6 - - u8 - - name\_len - - Zero. There is no name for this “unused” directory entry. - * - 0x7 - - u8 - - file\_type - - Zero. There is no file type for this “unused” directory entry. - * - 0x8 - - \_\_le16 - - limit - - Maximum number of dx\_entries that can follow this header, plus 1 for - the header itself. - * - 0xA - - \_\_le16 - - count - - Actual number of dx\_entries that follow this header, plus 1 for the - header itself. - * - 0xE - - \_\_le32 - - block - - The block number (within the directory file) that goes with the lowest - hash value of this block. This value is stored in the parent block. - * - 0x12 - - struct dx\_entry - - entries[0] - - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. - -The hash maps that exist in both ``struct dx_root`` and -``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes -long: - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Type - - Name - - Description - * - 0x0 - - \_\_le32 - - hash - - Hash code. - * - 0x4 - - \_\_le32 - - block - - Block number (within the directory file, not filesystem blocks) of the - next node in the htree. - -(If you think this is all quite clever and peculiar, so does the -author.) - -If metadata checksums are enabled, the last 8 bytes of the directory -block (precisely the length of one dx\_entry) are used to store a -``struct dx_tail``, which contains the checksum. The ``limit`` and -``count`` entries in the dx\_root/dx\_node structures are adjusted as -necessary to fit the dx\_tail into the block. If there is no space for -the dx\_tail, the user is notified to run e2fsck -D to rebuild the -directory index (which will ensure that there's space for the checksum. -The dx\_tail structure is 8 bytes long and looks like this: - -.. list-table:: - :widths: 1 1 1 77 - :header-rows: 1 - - * - Offset - - Type - - Name - - Description - * - 0x0 - - u32 - - dt\_reserved - - Zero. - * - 0x4 - - \_\_le32 - - dt\_checksum - - Checksum of the htree directory block. - -The checksum is calculated against the FS UUID, the htree index header -(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in -use, and the tail block (dx\_tail). |