aboutsummaryrefslogtreecommitdiffstats
path: root/fs/f2fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-12-01Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-0/+1
Pull fs header updates from Christian Brauner: "This contains initial work to start splitting up fs.h. Begin the long-overdue work of splitting up the monolithic fs.h header. The header has grown to over 3000 lines and includes types and functions for many different subsystems, making it difficult to navigate and causing excessive compilation dependencies. This series introduces new focused headers for superblock-related code: - Rename fs_types.h to fs_dirent.h to better reflect its actual content (directory entry types) - Add fs/super_types.h containing superblock type definitions - Add fs/super.h containing superblock function declarations This is the first step in a longer effort to modularize the VFS headers. Cleanups: - Inode Field Layout Optimization (Mateusz Guzik) Move inode fields used during fast path lookup closer together to improve cache locality during path resolution. - current_umask() Optimization (Mateusz Guzik) Inline current_umask() and move it to fs_struct.h. This improves performance by avoiding function call overhead for this frequently-used function, and places it in a more appropriate header since it operates on fs_struct" * tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: move inode fields used during fast path lookup closer together fs: inline current_umask() and move it to fs_struct.h fs: add fs/super.h header fs: add fs/super_types.h header fs: rename fs_types.h to fs_dirent.h
2025-12-01Merge tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-1/+1
Pull folio updates from Christian Brauner: "Add a new folio_next_pos() helper function that returns the file position of the first byte after the current folio. This is a common operation in filesystems when needing to know the end of the current folio. The helper is lifted from btrfs which already had its own version, and is now used across multiple filesystems and subsystems: - btrfs - buffer - ext4 - f2fs - gfs2 - iomap - netfs - xfs - mm This fixes a long-standing bug in ocfs2 on 32-bit systems with files larger than 2GiB. Presumably this is not a common configuration, but the fix is backported anyway. The other filesystems did not have bugs, they were just mildly inefficient. This also introduce uoff_t as the unsigned version of loff_t. A recent commit inadvertently changed a comparison from being unsigned (on 64-bit systems) to being signed (which it had always been on 32-bit systems), leading to sporadic fstests failures. Generally file sizes are restricted to being a signed integer, but in places where -1 is passed to indicate "up to the end of the file", it is convenient to have an unsigned type to ensure comparisons are always unsigned regardless of architecture" * tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Add uoff_t mm: Use folio_next_pos() xfs: Use folio_next_pos() netfs: Use folio_next_pos() iomap: Use folio_next_pos() gfs2: Use folio_next_pos() f2fs: Use folio_next_pos() ext4: Use folio_next_pos() buffer: Use folio_next_pos() btrfs: Use folio_next_pos() filemap: Add folio_next_pos()
2025-12-01Merge tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-4/+1
Pull writeback updates from Christian Brauner: "Features: - Allow file systems to increase the minimum writeback chunk size. The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. This adds a superblock field that allows the file system to override the default size, and sets it to the zone size for zoned XFS. - Add logging for slow writeback when it exceeds sysctl_hung_task_timeout_secs. This helps identify tasks waiting for a long time and pinpoint potential issues. Recording the starting jiffies is also useful when debugging a crashed vmcore. - Wake up waiting tasks when finishing the writeback of a chunk Cleanups: - filemap_* writeback interface cleanups. Adding filemap_fdatawrite_wbc ended up being a mistake, as all but the original btrfs caller should be using better high level interfaces instead. This series removes all these low-level interfaces, switches btrfs to a more specific interface, and cleans up other too low-level interfaces. With this the writeback_control that is passed to the writeback code is only initialized in three places. - Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and filemap_fdatawrite_wbc - Add filemap_flush_nr helper for btrfs - Push struct writeback_control into start_delalloc_inodes in btrfs - Rename filemap_fdatawrite_range_kick to filemap_flush_range - Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm - Make wbc_to_tag() inline and use it in fs" * tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Make wbc_to_tag() inline and use it in fs. xfs: set s_min_writeback_pages for zoned file systems writeback: allow the file system to override MIN_WRITEBACK_PAGES writeback: cleanup writeback_chunk_size mm: rename filemap_fdatawrite_range_kick to filemap_flush_range mm: remove __filemap_fdatawrite_range mm: remove filemap_fdatawrite_wbc mm: remove __filemap_fdatawrite mm,btrfs: add a filemap_flush_nr helper btrfs: push struct writeback_control into start_delalloc_inodes btrfs: use the local tmp_inode variable in start_delalloc_inodes ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers 9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs) writeback: Wake up waiting tasks when finishing the writeback of a chunk.
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds4-5/+5
Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-05fs: inline current_umask() and move it to fs_struct.hMateusz Guzik1-0/+1
There is no good reason to have this as a func call, other than avoiding the churn of adding fs_struct.h as needed. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251104170448.630414-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-31f2fs: Use folio_next_pos()Matthew Wilcox (Oracle)1-1/+1
This is one instruction more efficient than open-coding folio_pos() + folio_size(). It's the equivalent of (x + y) << z rather than x << z + y << z. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20251024170822.1427218-6-willy@infradead.org Reviewed-by: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29fs: Make wbc_to_tag() inline and use it in fs.Julian Sun1-4/+1
The logic in wbc_to_tag() is widely used in file systems, so modify this function to be inline and use it in file systems. This patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20f2fs: use the new ->i_state accessorsMateusz Guzik4-5/+5
Change generated with coccinelle and fixed up by hand as appropriate. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-13f2fs: fix wrong block mapping for multi-devicesJaegeuk Kim1-1/+1
Assuming the disk layout as below, disk0: 0 --- 0x00035abfff disk1: 0x00035ac000 --- 0x00037abfff disk2: 0x00037ac000 --- 0x00037ebfff and we want to read data from offset=13568 having len=128 across the block devices, we can illustrate the block addresses like below. 0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 ------- | ^ ^ ^ | fofs 0 13568 13568+128 | ------------------------------------------------------ | LBA 0x37e8aa9 0x37ebfa9 0x37ec029 --- map 0x3caa9 0x3ffa9 In this example, we should give the relative map of the target block device ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by 0x37ebfff + 1 - 0x37ebfa9. In the below equation, however, map->m_pblk was supposed to be the original address instead of the one from the target block address. - map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk); Cc: stable@vger.kernel.org Fixes: 71f2c8206202 ("f2fs: multidevice: support direct IO") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-10-13f2fs: don't call iput() from f2fs_drop_inode()Mateusz Guzik1-1/+1
iput() calls the problematic routine, which does a ->i_count inc/dec cycle. Undoing it with iput() recurses into the problem. Note f2fs should not be playing games with the refcount to begin with, but that will be handled later. Right now solve the immediate regression. Fixes: bc986b1d756482a ("fs: stop accessing ->i_count directly in f2fs and gfs2") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509301450.138b448f-lkp@intel.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-10-03Merge tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fsLinus Torvalds15-157/+570
Pull f2fs updates from Jaegeuk Kim: "This focuses on two primary updates for Android devices. First, it sets hash-based file name lookup as the default method to improve performance, while retaining an option to fall back to a linear lookup. Second, it resolves a persistent issue with the 'checkpoint=enable' feature. The update further boosts performance by prefetching node blocks, merging FUA writes more efficiently, and optimizing block allocation policies. The release is rounded out by a comprehensive set of bug fixes that address memory safety, data integrity, and potential system hangs, along with minor documentation and code clean-ups. Enhancements: - add mount option and sysfs entry to tune the lookup mode - dump more information and add a timeout when enabling/disabling checkpoints - readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode - merge FUA command with the existing writes - allocate HOT_DATA for IPU writes - Use allocate_section_policy to control write priority in multi-devices setups - add reserved nodes for privileged users - Add bggc_io_aware to adjust the priority of BG_GC when issuing IO - show the list of donation files Bug fixes: - add missing dput() when printing the donation list - fix UAF issue in f2fs_merge_page_bio() - add sanity check on ei.len in __update_extent_tree_range() - fix infinite loop in __insert_extent_tree() - fix zero-sized extent for precache extents - fix to mitigate overhead of f2fs_zero_post_eof_page() - fix to avoid migrating empty section - fix to truncate first page in error path of f2fs_truncate() - fix to update map->m_next_extent correctly in f2fs_map_blocks() - fix wrong layout information on 16KB page - fix to do sanity check on node footer for non inode dnode - fix to avoid NULL pointer dereference in f2fs_check_quota_consistency() - fix to detect potential corrupted nid in free_nid_list - fix to clear unusable_cap for checkpoint=enable - fix to zero data after EOF for compressed file correctly - fix to avoid overflow while left shift operation - fix condition in __allow_reserved_blocks()" * tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits) f2fs: add missing dput() when printing the donation list f2fs: fix UAF issue in f2fs_merge_page_bio() f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode f2fs: add sanity check on ei.len in __update_extent_tree_range() f2fs: fix infinite loop in __insert_extent_tree() f2fs: fix zero-sized extent for precache extents f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page() f2fs: fix to avoid migrating empty section f2fs: fix to truncate first page in error path of f2fs_truncate() f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks() f2fs: fix wrong layout information on 16KB page f2fs: clean up error handing of f2fs_submit_page_read() f2fs: avoid unnecessary folio_clear_uptodate() for cleanup f2fs: merge FUA command with the existing writes f2fs: allocate HOT_DATA for IPU writes f2fs: Use allocate_section_policy to control write priority in multi-devices setups Documentation: f2fs: Reword title Documentation: f2fs: Indent compression_mode option list Documentation: f2fs: Wrap snippets in literal code blocks Documentation: f2fs: Span write hint table section rows ...
2025-10-03f2fs: add missing dput() when printing the donation listJaegeuk Kim1-0/+1
We missed to call dput() on the grabbed dentry. Fixes: f1a49c1b112b ("f2fs: show the list of donation files") Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-29Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds3-3/+19
Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function
2025-09-28f2fs: fix UAF issue in f2fs_merge_page_bio()Chao Yu1-1/+1
As JY reported in bugzilla [1], Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : [0xffffffe51d249484] f2fs_is_cp_guaranteed+0x70/0x98 lr : [0xffffffe51d24adbc] f2fs_merge_page_bio+0x520/0x6d4 CPU: 3 UID: 0 PID: 6790 Comm: kworker/u16:3 Tainted: P B W OE 6.12.30-android16-5-maybe-dirty-4k #1 5f7701c9cbf727d1eebe77c89bbbeb3371e895e5 Tainted: [P]=PROPRIETARY_MODULE, [B]=BAD_PAGE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Workqueue: writeback wb_workfn (flush-254:49) Call trace: f2fs_is_cp_guaranteed+0x70/0x98 f2fs_inplace_write_data+0x174/0x2f4 f2fs_do_write_data_page+0x214/0x81c f2fs_write_single_data_page+0x28c/0x764 f2fs_write_data_pages+0x78c/0xce4 do_writepages+0xe8/0x2fc __writeback_single_inode+0x4c/0x4b4 writeback_sb_inodes+0x314/0x540 __writeback_inodes_wb+0xa4/0xf4 wb_writeback+0x160/0x448 wb_workfn+0x2f0/0x5dc process_scheduled_works+0x1c8/0x458 worker_thread+0x334/0x3f0 kthread+0x118/0x1ac ret_from_fork+0x10/0x20 [1] https://bugzilla.kernel.org/show_bug.cgi?id=220575 The panic was caused by UAF issue w/ below race condition: kworker - writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_merge_page_bio - add_inu_page : cache page #1 into bio & cache bio in io->bio_list - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_merge_page_bio - add_inu_page : cache page #2 into bio which is linked in io->bio_list write - f2fs_write_begin : write page #1 - f2fs_folio_wait_writeback - f2fs_submit_merged_ipu_write - f2fs_submit_write_bio : submit bio which inclues page #1 and #2 software IRQ - f2fs_write_end_io - fscrypt_free_bounce_page : freed bounced page which belongs to page #2 - inc_page_count( , WB_DATA_TYPE(data_folio), false) : data_folio points to fio->encrypted_page the bounced page can be freed before accessing it in f2fs_is_cp_guarantee() It can reproduce w/ below testcase: Run below script in shell #1: for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \ -c "pwrite 0 32k" -c "fdatasync" Run below script in shell #2: for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \ -c "pwrite 0 32k" -c "fdatasync" So, in f2fs_merge_page_bio(), let's avoid using fio->encrypted_page after commit page into internal ipu cache. Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio") Reported-by: JY <JY.Ho@mediatek.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-28f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE modeYunji Kang1-0/+3
In f2fs_precache_extents(), For large files, It requires reading many node blocks. Instead of reading each node block with synchronous I/O, this patch applies readahead so that node blocks can be fetched in advance. It reduces the overhead of repeated sync reads and improves efficiency when precaching extents of large files. I created a file with the same largest extent and executed the test. For this experiment, I set the file's largest extent with an offset of 0 and a size of 1GB. I configured the remaining area with 100MB extents. 5GB test file: dd if=/dev/urandom of=test1 bs=1m count=5120 cp test1 test2 fsync test1 dd if=test1 of=test2 bs=1m skip=1024 seek=1024 count=100 conv=notrunc dd if=test1 of=test2 bs=1m skip=1224 seek=1224 count=100 conv=notrunc ... dd if=test1 of=test2 bs=1m skip=5024 seek=5024 count=100 conv=notrunc reboot I also created 10GB and 20GB files with large extents using the same method. ioctl(F2FS_IOC_PRECACHE_EXTENTS) test results are as follows: +-----------+---------+---------+-----------+ | File size | Before | After | Reduction | +-----------+---------+---------+-----------+ | 5GB | 101.8ms | 37.0ms | 72.1% | | 10GB | 222.9ms | 56.0ms | 74.9% | | 20GB | 446.2ms | 116.4ms | 73.9% | +-----------+---------+---------+-----------+ Tested on a 256GB mobile device with an SM8750 chipset. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yunji Kang <yunji0.kang@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-28f2fs: add sanity check on ei.len in __update_extent_tree_range()Chao Yu1-0/+9
Add a sanity check in __update_extent_tree_range() to detect any zero-sized extent update. Signed-off-by: wangzijie <wangzijie1@honor.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-17f2fs: fix infinite loop in __insert_extent_tree()wangzijie1-0/+6
When we get wrong extent info data, and look up extent_node in rb tree, it will cause infinite loop (CONFIG_F2FS_CHECK_FS=n). Avoiding this by return NULL and print some kernel messages in that case. Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-17f2fs: fix zero-sized extent for precache extentswangzijie1-3/+4
Script to reproduce: f2fs_io write 1 0 1881 rand dsync testfile f2fs_io fallocate 0 7708672 4096 testfile f2fs_io write 1 1881 1 rand buffered testfile fsync testfile umount mount f2fs_io precache_extents testfile When the data layout is something like this: dnode1: dnode2: [0] A [0] NEW_ADDR [1] A+1 [1] 0x0 ... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 During precache_extents, we map the last block(valid blkaddr) in dnode1: map->m_flags |= F2FS_MAP_MAPPED; map->m_pblk = blkaddr(valid blkaddr); map->m_len = 1; then we goto next_dnode, meet the first block in dnode2(hole), goto sync_out: map->m_flags & F2FS_MAP_MAPPED == true, and we make zero-sized extent: map->m_len = 1 ofs = start_pgofs - map->m_lblk = 1882 - 1881 = 1 ei.fofs = start_pgofs = 1882 ei.len = map->m_len - ofs = 1 - 1 = 0 Rebased on patch[1], this patch can cover these cases to avoid zero-sized extent: A,B,C is valid blkaddr case1: dnode1: dnode2: [0] A [0] NEW_ADDR [1] A+1 [1] 0x0 ... .... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 case2: dnode1: dnode2: [0] A [0] C (C!=B+1) [1] A+1 [1] C+1 ... .... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 case3: dnode1: dnode2: [0] A [0] C (C!=B+2) [1] A+1 [1] C+1 ... .... [1015] A+1015 [1016] B (B!=A+1016) [1017] B+1 [1017] 0x0 [1] https://lore.kernel.org/linux-f2fs-devel/20250912081250.44383-1-chao@kernel.org/ Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS") Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-16f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page()Chao Yu1-20/+19
f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock and page lookup, change as below to mitigate its overhead: - check new_size before grabbing invalidate_lock - lookup and invalidate pages only in range of [old_size, new_size] Fixes: ba8dac350faf ("f2fs: fix to zero post-eof page") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-16f2fs: fix to avoid migrating empty sectionChao Yu1-1/+15
It reports a bug from device w/ zufs: F2FS-fs (dm-64): Inconsistent segment (173822) type [1, 0] in SSA and SIT F2FS-fs (dm-64): Stopped filesystem due to reason: 4 Thread A Thread B - f2fs_expand_inode_data - f2fs_allocate_pinning_section - f2fs_gc_range - do_garbage_collect w/ segno #x - writepage - f2fs_allocate_data_block - new_curseg - allocate segno #x The root cause is: fallocate on pinning file may race w/ block allocation as above, result in do_garbage_collect() from fallocate() may migrate segment which is just allocated by a log, the log will update segment type in its in-memory structure, however GC will get segment type from on-disk SSA block, once segment type changes by log, we can detect such inconsistency, then shutdown filesystem. In this case, on-disk SSA shows type of segno #173822 is 1 (SUM_TYPE_NODE), however segno #173822 was just allocated as data type segment, so in-memory SIT shows type of segno #173822 is 0 (SUM_TYPE_DATA). Change as below to fix this issue: - check whether current section is empty before gc - add sanity checks on do_garbage_collect() to avoid any race case, result in migrating segment used by log. - btw, it fixes misc issue in printed logs: "SSA and SIT" -> "SIT and SSA". Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-16f2fs: fix to truncate first page in error path of f2fs_truncate()Chao Yu1-1/+9
syzbot reports a bug as below: loop0: detected capacity change from 0 to 40427 F2FS-fs (loop0): Wrong SSA boundary, start(3584) end(4096) blocks(3072) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): f2fs_convert_inline_folio: corrupted inline inode ino=3, i_addr[0]:0x1601, run fsck to fix. ------------[ cut here ]------------ kernel BUG at fs/inode.c:753! RIP: 0010:clear_inode+0x169/0x190 fs/inode.c:753 Call Trace: <TASK> evict+0x504/0x9c0 fs/inode.c:810 f2fs_fill_super+0x5612/0x6fa0 fs/f2fs/super.c:5047 get_tree_bdev_flags+0x40e/0x4d0 fs/super.c:1692 vfs_get_tree+0x8f/0x2b0 fs/super.c:1815 do_new_mount+0x2a2/0x9e0 fs/namespace.c:3808 do_mount fs/namespace.c:4136 [inline] __do_sys_mount fs/namespace.c:4347 [inline] __se_sys_mount+0x317/0x410 fs/namespace.c:4324 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f During f2fs_evict_inode(), clear_inode() detects that we missed to truncate all page cache before destorying inode, that is because in below path, we will create page #0 in cache, but missed to drop it in error path, let's fix it. - evict - f2fs_evict_inode - f2fs_truncate - f2fs_convert_inline_inode - f2fs_grab_cache_folio : create page #0 in cache - f2fs_convert_inline_folio : sanity check failed, return -EFSCORRUPTED - clear_inode detects that inode->i_data.nrpages is not zero Fixes: 92dffd01790a ("f2fs: convert inline_data when i_size becomes large") Reported-by: syzbot+90266696fe5daacebd35@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68c09802.050a0220.3c6139.000e.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-16f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks()Chao Yu1-1/+1
Script to reproduce: mkfs.f2fs -O extra_attr,compression /dev/vdb -f mount /dev/vdb /mnt/f2fs -o mode=lfs,noextent_cache cd /mnt/f2fs f2fs_io write 1 0 1024 rand dsync testfile xfs_io testfile -c "fsync" f2fs_io write 1 0 512 rand dsync testfile xfs_io testfile -c "fsync" cd / umount /mnt/f2fs mount /dev/vdb /mnt/f2fs f2fs_io precache_extents /mnt/f2fs/testfile umount /mnt/f2fs Tracepoint output: f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 0, len = 512, blkaddr = 1055744, c_len = 0 f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 513, len = 351, blkaddr = 17921, c_len = 0 f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 864, len = 160, blkaddr = 18272, c_len = 0 During precache_extents, there is off-by-one issue, we should update map->m_next_extent to pgofs rather than pgofs + 1, if last blkaddr is valid and not contiguous to previous extent. Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-15fs: rename generic_delete_inode() and generic_drop_inode()Mateusz Guzik1-1/+1
generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-09f2fs: fix wrong layout information on 16KB pageJaegeuk Kim1-3/+6
This patch fixes to support different block size. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-09f2fs: clean up error handing of f2fs_submit_page_read()Chao Yu1-33/+8
Below two functions should never fail, clean up error handling in their callers: 1) f2fs_grab_read_bio() in f2fs_submit_page_read() 2) bio_add_folio() in f2fs_submit_page_read() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-09f2fs: avoid unnecessary folio_clear_uptodate() for cleanupChao Yu1-1/+1
In error path of __get_node_folio(), if the folio is not uptodate, let's avoid unnecessary folio_clear_uptodate() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-09f2fs: merge FUA command with the existing writesJaegeuk Kim1-1/+3
FUA writes can be merged to the existing write IOs. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-02f2fs: allocate HOT_DATA for IPU writesJaegeuk Kim1-1/+2
Let's split IPU writes in hot data area to improve the GC efficiency. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-29f2fs: Use allocate_section_policy to control write priority in multi-devices setupsLiao Yuanhong5-0/+52
Introduces two new sys nodes: allocate_section_hint and allocate_section_policy. The allocate_section_hint identifies the boundary between devices, measured in sections; it defaults to the end of the device for single storage setups, and the end of the first device for multiple storage setups. The allocate_section_policy determines the write strategy, with a default value of 0 for normal sequential write strategy. A value of 1 prioritizes writes before the allocate_section_hint, while a value of 2 prioritizes writes after it. This strategy addresses the issue where, despite F2FS supporting multiple devices, SOC vendors lack multi-devices support (currently only supporting zoned devices). As a workaround, multiple storage devices are mapped to a single dm device. Both this workaround and the F2FS multi-devices solution may require prioritizing writing to certain devices, such as a device with better performance or when switching is needed due to performance degradation near a device's end. For scenarios with more than two devices, sort them at mount time to utilize this feature. When using this feature with a single storage device, it has almost no impact. However, for configurations where multiple storage devices are mapped to the same dm device using F2FS, utilizing this feature can provide some optimization benefits. Therefore, I believe it should not be limited to just multi-devices usage. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-28f2fs: fix to do sanity check on node footer for non inode dnodeChao Yu5-23/+46
As syzbot reported below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/file.c:1243! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5354 Comm: syz.0.0 Not tainted 6.17.0-rc1-syzkaller-00211-g90d970cade8e #0 PREEMPT(full) RIP: 0010:f2fs_truncate_hole+0x69e/0x6c0 fs/f2fs/file.c:1243 Call Trace: <TASK> f2fs_punch_hole+0x2db/0x330 fs/f2fs/file.c:1306 f2fs_fallocate+0x546/0x990 fs/f2fs/file.c:2018 vfs_fallocate+0x666/0x7e0 fs/open.c:342 ksys_fallocate fs/open.c:366 [inline] __do_sys_fallocate fs/open.c:371 [inline] __se_sys_fallocate fs/open.c:369 [inline] __x64_sys_fallocate+0xc0/0x110 fs/open.c:369 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1e65f8ebe9 w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent truncation range in direct node in f2fs_truncate_hole(). The root cause is: a non-inode dnode may has the same footer.ino and footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE() may return wrong blkaddr count which may be 923 typically, by chance, dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 || ...). count = min(end_offset - dn.ofs_in_node, pg_end - pg_start); This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing passing the new_type to sanity_check_node_footer in f2fs_get_node_folio() to detect corruption that a non-inode dnode has the same footer.ino and footer.nid. Scripts to reproduce: mkfs.f2fs -f /dev/vdb mount /dev/vdb /mnt/f2fs touch /mnt/f2fs/foo touch /mnt/f2fs/bar dd if=/dev/zero of=/mnt/f2fs/foo bs=1M count=8 umount /mnt/f2fs inject.f2fs --node --mb i_nid --nid 4 --idx 0 --val 5 /dev/vdb mount /dev/vdb /mnt/f2fs xfs_io /mnt/f2fs/foo -c "fpunch 6984k 4k" Cc: stable@kernel.org Reported-by: syzbot+b9c7ffd609c3f09416ab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68a68e27.050a0220.1a3988.0002.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-27fs: stop accessing ->i_count directly in f2fs and gfs2Josef Bacik1-2/+2
Instead of accessing ->i_count directly in these file systems, use the appropriate __iget and iput helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/b8e6eb8a3e690ce082828d3580415bf70dfa93aa.1755806649.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21f2fs: move verity info pointer to fs-specific part of inodeEric Biggers3-0/+8
Move the fsverity_info pointer into the filesystem-specific part of the inode by adding the field f2fs_inode_info::i_verity_info and configuring fsverity_operations::inode_info_offs accordingly. This is a prerequisite for a later commit that removes inode::i_verity_info, saving memory and improving cache efficiency on filesystems that don't support fsverity. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-11-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21f2fs: move crypt info pointer to fs-specific part of inodeEric Biggers2-1/+9
Move the fscrypt_inode_info pointer into the filesystem-specific part of the inode by adding the field f2fs_inode_info::i_crypt_info and configuring fscrypt_operations::inode_info_offs accordingly. This is a prerequisite for a later commit that removes inode::i_crypt_info, saving memory and improving cache efficiency with filesystems that don't support fscrypt. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-5-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-20f2fs: show the list of donation filesJaegeuk Kim1-0/+64
This patch introduces a proc entry to show the currently enrolled donation files. - "File path" indicates a file. - "Status" a. "Donated" means the file is registed in the donation list by fadvise(offset, length, POSIX_FADV_NOREUSE) b. "Evicted" means the donated pages were reclaimed. - "Offset (kb)" and "Length (kb) show the registered donation range. - "Cached pages (kb)" shows the amount of cached pages in the inode page cache. For example, # of files : 2 File path Status Donation offset (kb) Donation size (kb) File cached size (kb) --- /local/test2 Donated 0 1048576 2097152 /local/test Evicted 0 1048576 1048576 Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: fix to allow removing qf_nameChao Yu1-2/+5
The mount behavior changed after commit d18535132523 ("f2fs: separate the options parsing and options checking"), let's fix it. [Scripts] mkfs.f2fs -f /dev/vdb mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs quotacheck -uc /mnt/f2fs umount /mnt/f2fs mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs mount|grep f2fs mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs mount|grep f2fs dmesg [Before commit] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,... kmsg: no output [After commit] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... kmsg: "user quota file already specified" [After patch] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,... kmsg: "remove qf_name aquota.user" Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: fix to avoid NULL pointer dereference in f2fs_check_quota_consistency()Chao Yu1-1/+2
syzbot reported a f2fs bug as below: Oops: gen[ 107.736417][ T5848] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 5848 Comm: syz-executor263 Tainted: G W 6.17.0-rc1-syzkaller-00014-g0e39a731820a #0 PREEMPT_{RT,(full)} RIP: 0010:strcmp+0x3c/0xc0 lib/string.c:284 Call Trace: <TASK> f2fs_check_quota_consistency fs/f2fs/super.c:1188 [inline] f2fs_check_opt_consistency+0x1378/0x2c10 fs/f2fs/super.c:1436 __f2fs_remount fs/f2fs/super.c:2653 [inline] f2fs_reconfigure+0x482/0x1770 fs/f2fs/super.c:5297 reconfigure_super+0x224/0x890 fs/super.c:1077 do_remount fs/namespace.c:3314 [inline] path_mount+0xd18/0xfe0 fs/namespace.c:4112 do_mount fs/namespace.c:4133 [inline] __do_sys_mount fs/namespace.c:4344 [inline] __se_sys_mount+0x317/0x410 fs/namespace.c:4321 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The direct reason is f2fs_check_quota_consistency() may suffer null-ptr-deref issue in strcmp(). The bug can be reproduced w/ below scripts: mkfs.f2fs -f /dev/vdb mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs quotacheck -uc /mnt/f2fs/ umount /mnt/f2fs mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs umount /mnt/f2fs So, before old_qname and new_qname comparison, we need to check whether they are all valid pointers, fix it. Reported-by: syzbot+d371efea57d5aeab877b@syzkaller.appspotmail.com Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Closes: https://lore.kernel.org/linux-f2fs-devel/689ff889.050a0220.e29e5.0037.GAE@google.com Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: clean up w/ get_left_section_blocks()Chao Yu1-16/+12
Introduce get_left_section_blocks() for cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: add reserved nodes for privileged usersChunhai Guo2-12/+48
This patch allows privileged users to reserve nodes via the 'reserve_node' mount option, which is similar to the existing 'reserve_root' option. "-o reserve_node=<N>" means <N> nodes are reserved for privileged users only. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: Add bggc_io_aware to adjust the priority of BG_GC when issuing IOLiao Yuanhong3-7/+22
Currently, we have encountered some issues while testing ZUFS. In situations near the storage limit (e.g., 50GB remaining), and after simulating fragmentation by repeatedly writing and deleting data, we found that application installation and startup tests conducted after idling for a few minutes take significantly longer several times that of traditional UFS. Tracing the operations revealed that the majority of I/Os were issued by background GC, which blocks normal I/O operations. Under normal circumstances, ZUFS indeed requires more background GC and employs a more aggressive GC strategy. However, I aim to find a way to minimize the impact on regular I/O operations under these near-limit conditions. To address this, I have introduced a bggc_io_aware feature, which controls the prioritization of background GC in the presence of I/Os. This switch can be adjusted at the framework level to implement different strategies. If set to AWARE_ALL_IO, all background GC operations will be skipped during active I/O issuance. The default option remains consistent with the current strategy, ensuring no change in behavior. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: dump more information for f2fs_{enable,disable}_checkpoint()Chao Yu1-0/+16
Changes as below: - print more logs for f2fs_{enable,disable}_checkpoint() - account and dump time stats for f2fs_enable_checkpoint() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: add timeout in f2fs_enable_checkpoint()Chao Yu2-6/+17
During f2fs_enable_checkpoint() in remount(), if we flush a large amount of dirty pages into slow device, it may take long time which will block write IO, let's add a timeout machanism during dirty pages flush to avoid long time block in f2fs_enable_checkpoint(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: fix to detect potential corrupted nid in free_nid_listChao Yu1-1/+16
As reported, on-disk footer.ino and footer.nid is the same and out-of-range, let's add sanity check on f2fs_alloc_nid() to detect any potential corruption in free_nid_list. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-20f2fs: fix to clear unusable_cap for checkpoint=enableChao Yu1-0/+4
mount -t f2fs -o checkpoint=disable:10% /dev/vdb /mnt/f2fs/ mount -t f2fs -o remount,checkpoint=enable /dev/vdb /mnt/f2fs/ kernel log: F2FS-fs (vdb): Adjust unusable cap for checkpoint=disable = 204440 / 10% If we has assigned checkpoint=enable mount option, unusable_cap{,_perc} parameters of checkpoint=disable should be reset, then calculation and log print could be avoid in adjust_unusable_cap_perc(). Fixes: 1ae18f71cb52 ("f2fs: fix checkpoint=disable:%u%%") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: clean up f2fs_truncate_partial_cluster()Chao Yu1-23/+19
Clean up codes as below: - avoid unnecessary "err > 0" check condition - use "1 << log_cluster_size" instead of F2FS_I(inode)->i_cluster_size No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: fix to zero data after EOF for compressed file correctlyChao Yu1-7/+16
generic/091 may fail, then it bisects to the bad commit ba8dac350faf ("f2fs: fix to zero post-eof page"). What will cause generic/091 to fail is something like below Testcase #1: 1. write 16k as compressed blocks 2. truncate to 12k 3. truncate to 20k 4. verify data in range of [12k, 16k], however data is not zero as expected Script of Testcase #1 mkfs.f2fs -f -O extra_attr,compression /dev/vdb mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1 dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc sync truncate -s $((12*1024)) /mnt/f2fs/file truncate -s $((20*1024)) /mnt/f2fs/file dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3 od /mnt/f2fs/data umount /mnt/f2fs Analisys: in step 2), we will redirty all data pages from #0 to #3 in compressed cluster, and zero page #3, in step 3), f2fs_setattr() will call f2fs_zero_post_eof_page() to drop all page cache post eof, includeing dirtied page #3, in step 4) when we read data from page #3, it will decompressed cluster and extra random data to page #3, finally, we hit the non-zeroed data post eof. However, the commit ba8dac350faf ("f2fs: fix to zero post-eof page") just let the issue be reproduced easily, w/o the commit, it can reproduce this bug w/ below Testcase #2: 1. write 16k as compressed blocks 2. truncate to 8k 3. truncate to 12k 4. truncate to 20k 5. verify data in range of [12k, 16k], however data is not zero as expected Script of Testcase #2 mkfs.f2fs -f -O extra_attr,compression /dev/vdb mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1 dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc sync truncate -s $((8*1024)) /mnt/f2fs/file truncate -s $((12*1024)) /mnt/f2fs/file truncate -s $((20*1024)) /mnt/f2fs/file echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3 od /mnt/f2fs/data umount /mnt/f2fs Anlysis: in step 2), we will redirty all data pages from #0 to #3 in compressed cluster, and zero page #2 and #3, in step 3), we will truncate page #3 in page cache, in step 4), expand file size, in step 5), hit random data post eof w/ the same reason in Testcase #1. Root Cause: In f2fs_truncate_partial_cluster(), after we truncate partial data block on compressed cluster, all pages in cluster including the one post eof will be dirtied, after another tuncation, dirty page post eof will be dropped, however on-disk compressed cluster is still valid, it may include non-zero data post eof, result in exposing previous non-zero data post eof while reading. Fix: In f2fs_truncate_partial_cluster(), let change as below to fix: - call filemap_write_and_wait_range() to flush dirty page - call truncate_pagecache() to drop pages or zero partial page post eof - call f2fs_do_truncate_blocks() to truncate non-compress cluster to last valid block Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode") Reported-by: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: fix to avoid overflow while left shift operationChao Yu1-1/+1
Should cast type of folio->index from pgoff_t to loff_t to avoid overflow while left shift operation. Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: add sysfs entry for effective lookup modeDaniel Lee1-0/+18
This commit introduces a new read-only sysfs entry at /sys/fs/f2fs/<device>/effective_lookup_mode. This entry displays the actual directory lookup mode F2FS is currently using. This is needed for debugging and verification, as the behavior is determined by both on-disk flags and mount options. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: add lookup_mode mount optionDaniel Lee3-1/+48
For casefolded directories, f2fs may fall back to a linear search if a hash-based lookup fails. This can cause severe performance regressions. While this behavior can be controlled by userspace tools (e.g. mkfs, fsck) by setting an on-disk flag, a kernel-level solution is needed to guarantee the lookup behavior regardless of the on-disk state. This commit introduces the 'lookup_mode' mount option to provide this kernel-side control. The option accepts three values: - perf: (Default) Enforces a hash-only lookup. The linear fallback is always disabled. - compat: Enables the linear search fallback for compatibility with directory entries from older kernels. - auto: Determines the mode based on the on-disk flag, preserving the userspace-based behavior. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: add error checking in do_write_page()mason.zhang1-2/+8
Otherwise, the filesystem may unaware of potential file corruption. Signed-off-by: mason.zhang <masonzhang.linuxer@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: fix condition in __allow_reserved_blocks()Chao Yu1-3/+1
If reserve_root mount option is not assigned, __allow_reserved_blocks() will return false, it's not correct, fix it. Fixes: 7e65be49ed94 ("f2fs: add reserved blocks for root user") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>