aboutsummaryrefslogtreecommitdiffstats
path: root/fs/f2fs/file.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-12f2fs: remove redundant check from f2fs_setflags_common()Eric Biggers1-8/+1
Now that f2fs_ioc_setflags() and f2fs_ioc_fssetxattr() call the VFS helper functions which check for permission to change the immutable and append-only flags, it's no longer needed to do this check in f2fs_setflags_common() too. So remove it. This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-12f2fs: use generic checking function for FS_IOC_FSSETXATTREric Biggers1-31/+14
Make the f2fs implementation of FS_IOC_FSSETXATTR use the new VFS helper function vfs_ioc_fssetxattr_check(), and remove the project quota check since it's now done by the helper function. This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-12f2fs: use generic checking and prep function for FS_IOC_SETFLAGSEric Biggers1-1/+8
Make the f2fs implementation of FS_IOC_SETFLAGS use the new VFS helper function vfs_ioc_setflags_prepare(). This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-10f2fs: allow all the users to pin a fileJaegeuk Kim1-3/+0
This patch allows users to pin files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: allocate blocks for pinned fileJaegeuk Kim1-1/+6
This patch allows fallocate to allocate physical blocks for pinned file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: use generic EFSBADCRC/EFSCORRUPTEDChao Yu1-1/+1
f2fs uses EFAULT as error number to indicate filesystem is corrupted all the time, but generic filesystems use EUCLEAN for such condition, we need to change to follow others. This patch adds two new macros as below to wrap more generic error code macros, and spread them in code. EFSBADCRC EBADMSG /* Bad CRC detected */ EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: Use DIV_ROUND_UP() instead of open-codingGeert Uytterhoeven1-3/+3
Replace the open-coded divisions with round-up by calls to the DIV_ROUND_UP() helper macro. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()Joe Perches1-13/+8
- Add and use f2fs_<level> macros - Convert f2fs_msg to f2fs_printk - Remove level from f2fs_printk and embed the level in the format - Coalesce formats and align multi-line arguments - Remove unnecessary duplicate extern f2fs_msg f2fs.h Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: ioctl for removing a range from F2FSQiuyang Sun1-0/+24
This ioctl shrinks a given length (aligned to sections) from end of the main area. Any cursegs and valid blocks will be moved out before invalidating the range. This feature can be used for adjusting partition sizes online. History of the patch: Sahitya Tummala: - Add this ioctl for f2fs_compat_ioctl() as well. - Fix debugfs status to reflect the online resize changes. - Fix potential race between online resize path and allocate new data block path or gc path. Others: - Rename some identifiers. - Add some error handling branches. - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range. - Implement this interface as ext4's, and change the parameter from shrunk bytes to new block count of F2FS. - During resizing, force to empty sit_journal and forbid adding new entries to it, in order to avoid invalid segno in journal after resize. - Reduce sbi->user_block_count before resize starts. - Commit the updated superblock first, and then update in-memory metadata only when the former succeeds. - Target block count must align to sections. - Write checkpoint before and after committing the new superblock, w/o CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if resize fails after the new superblock is committed. - In free_segment_range(), reduce granularity of gc_mutex. - Add protection on curseg migration. - Add freeze_bdev() and thaw_bdev() for resize fs. - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation. - Recover super_block and FS metadata when resize fails. - No need to clear CP_FSCK_FLAG in update_ckpt_flags(). - Clean up the sb and fs metadata update functions for resize_fs. Geert Uytterhoeven: - Use div_u64*() for 64-bit divisions Arnd Bergmann: - Not all architectures support get_user() with a 64-bit argument: ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined! Use copy_from_user() here, this will always work. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-06-21f2fs: separate f2fs i_flags from fs_flags and ext4 i_flagsEric Biggers1-80/+159
f2fs copied all the on-disk i_flags from ext4, and along with it the assumption that the on-disk i_flags are the same as the bits used by FS_IOC_GETFLAGS and FS_IOC_SETFLAGS. This is problematic because reserving an on-disk inode flag in either filesystem's i_flags or in these ioctls effectively reserves it in all the other places too. In fact, most of the "f2fs i_flags" are not used by f2fs at all. Fix this by separating f2fs's i_flags from the ioctl bits and ext4's i_flags. In the process, un-reserve all "f2fs i_flags" that aren't actually supported by f2fs. This included various flags that were not settable at all, as well as various flags that were settable by FS_IOC_SETFLAGS but didn't actually do anything. There's a slight chance we'll need to add some flag(s) back to FS_IOC_SETFLAGS in order to avoid breaking users who expect f2fs to accept some random flag(s). But hopefully such users don't exist. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: add tracepoint for f2fs_filemap_fault()Chao Yu1-0/+2
This patch adds tracepoint for f2fs_filemap_fault(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: introduce DATA_GENERIC_ENHANCEChao Yu1-3/+12
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove redundant check in f2fs_file_write_iter()Chengguang Xu1-13/+9
We have already checked flag IOCB_DIRECT in the sanity check of flag IOCB_NOWAIT, so don't have to check it again here. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: allow address pointer number of dnode aligning to specified sizeChao Yu1-3/+4
This patch expands scalability of dnode layout, it allows address pointer number of dnode aligning to specified size (now, the size is one byte by default), and later the number can align to compress cluster size (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making design of compress meta layout simple. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove new blank line of f2fs kernel messageChao Yu1-2/+2
Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-16f2fs: add tracepoint for f2fs_file_write_iter()Chao Yu1-9/+19
This patch adds tracepoint for f2fs_file_write_iter(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-05f2fs: Fix use of number of devicesDamien Le Moal1-1/+1
For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371d199 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-15Merge tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fsLinus Torvalds1-22/+24
Pull f2fs updates from Jaegeuk Kim: "We've continued mainly to fix bugs in this round, as f2fs has been shipped in more devices. Especially, we've focused on stabilizing checkpoint=disable feature, and provided some interfaces for QA. Enhancements: - expose FS_NOCOW_FL for pin_file - run discard jobs at unmount time with timeout - tune discarding thread to avoid idling which consumes power - some checking codes to address vulnerabilities - give random value to i_generation - shutdown with more flags for QA Bug fixes: - clean up stale objects when mount is failed along with checkpoint=disable - fix system being stuck due to wrong count by atomic writes - handle some corrupted disk cases - fix a deadlock in f2fs_read_inline_dir We've also added some minor build error fixes and clean-up patches" * tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (53 commits) f2fs: set pin_file under CAP_SYS_ADMIN f2fs: fix to avoid deadlock in f2fs_read_inline_dir() f2fs: fix to adapt small inline xattr space in __find_inline_xattr() f2fs: fix to do sanity check with inode.i_inline_xattr_size f2fs: give some messages for inline_xattr_size f2fs: don't trigger read IO for beyond EOF page f2fs: fix to add refcount once page is tagged PG_private f2fs: remove wrong comment in f2fs_invalidate_page() f2fs: fix to use kvfree instead of kzfree f2fs: print more parameters in trace_f2fs_map_blocks f2fs: trace f2fs_ioc_shutdown f2fs: fix to avoid deadlock of atomic file operations f2fs: fix to dirty inode for i_mode recovery f2fs: give random value to i_generation f2fs: no need to take page lock in readdir f2fs: fix to update iostat correctly in IPU path f2fs: fix encrypted page memory leak f2fs: make fault injection covering __submit_flush_wait() f2fs: fix to retry fill_super only if recovery failed f2fs: silence VM_WARN_ON_ONCE in mempool_alloc ...
2019-03-14f2fs: set pin_file under CAP_SYS_ADMINJaegeuk Kim1-2/+2
Android uses pin_file for uncrypt during OTA, and that should be managed by CAP_SYS_ADMIN only. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12f2fs: trace f2fs_ioc_shutdownChao Yu1-0/+3
This patch supports to trace f2fs_ioc_shutdown. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12f2fs: fix to dirty inode for i_mode recoveryChao Yu1-4/+1
As Seulbae Kim reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202637 We didn't recover permission field correctly after sudden power-cut, the reason is in setattr we didn't add inode into global dirty list once i_mode is changed, so latter checkpoint triggered by fsync will not flush last i_mode into disk, result in this problem, fix it. Reported-by: Seulbae Kim <seulbae@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05f2fs: fix potential data inconsistence of checkpointChao Yu1-8/+6
Previously, we changed lock from cp_rwsem to node_change, it solved the deadlock issue which was caused by below race condition: Thread A Thread B - f2fs_setattr - f2fs_lock_op -- read_lock - dquot_transfer - __dquot_transfer - dquot_acquire - commit_dqblk - f2fs_quota_write - f2fs_write_begin - f2fs_write_failed - write_checkpoint - block_operations - f2fs_lock_all -- write_lock - f2fs_truncate_blocks - f2fs_lock_op -- read_lock But it breaks the sematics of cp_rwsem, in other callers like: - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed - f2fs_direct_IO -> f2fs_write_failed We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit bitmap update, which can cause further data corruption. So this patch reverts previous fix implementation, and try to fix deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed() only for quota file, and keep the preallocated data/node in the tail of quota file, we can expecte that the preallocated space can be used to store quota info latter soon. Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint") Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15f2fs: add quick mode of checkpoint=disable for QAJaegeuk Kim1-3/+3
This mode returns mount() quickly with EAGAIN. We can trigger this by shutdown(F2FS_GOING_DOWN_NEED_FSCK). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-23f2fs: use IS_ENCRYPTED() to check encryption statusChandan Rajendra1-5/+5
This commit removes the f2fs specific f2fs_encrypted_inode() and makes use of the generic IS_ENCRYPTED() macro to check for the encryption status of an inode. Acked-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-08f2fs: export FS_NOCOW_FL flag to userJaegeuk Kim1-0/+2
This exports pin_file status to user. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08f2fs: wait on atomic writes to count F2FS_CP_WB_DATAJaegeuk Kim1-5/+7
Otherwise, we can get wrong counts incurring checkpoint hang. IO_W (CP: -24, Data: 24, Flush: ( 0 0 1), Discard: ( 0 0)) Thread A Thread B - f2fs_write_data_pages - __write_data_page - f2fs_submit_page_write - inc_page_count(F2FS_WB_DATA) type is F2FS_WB_DATA due to file is non-atomic one - f2fs_ioc_start_atomic_write - set_inode_flag(FI_ATOMIC_FILE) - f2fs_write_end_io - dec_page_count(F2FS_WB_CP_DATA) type is F2FS_WB_DATA due to file becomes atomic one Cc: <stable@vger.kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: check PageWriteback flag for ordered caseChao Yu1-3/+3
For all ordered cases in f2fs_wait_on_page_writeback(), we need to check PageWriteback status, so let's clean up to relocate the check into f2fs_wait_on_page_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: fix to dirty inode synchronouslyChao Yu1-1/+1
If user change inode's i_flags via ioctl, let's add it into global dirty list, so that checkpoint can guarantee its persistence before fsync, it can make checkpoint keeping strong consistency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14f2fs: add an ioctl() to explicitly trigger fsck laterJaegeuk Kim1-0/+7
This adds an option in ioctl(F2FS_IOC_SHUTDOWN) in order to trigger fsck by setting a NEED_FSCK flag. Generally, shutdown is used for the test to validate filesystem consistency, and setting NEED_FSCK flag can be used for Android to trigger fsck.f2fs at boot time explicitly so that we could measure the elapsed time as well as force filesystem check. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix m_may_create to make OPU DIO write correctlyJia Zhu1-1/+3
Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix out-place-update DIO writeChao Yu1-1/+2
In get_more_blocks(), we may override @create as below code: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create is 1, so in LFS mode, dio overwrite under LFS mode can easily run out of free segments, result in below panic. Call Trace: allocate_segment_by_default+0xa8/0x270 [f2fs] f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs] __allocate_data_block+0x306/0x480 [f2fs] f2fs_map_blocks+0x6f6/0x920 [f2fs] __get_data_block+0x4f/0xb0 [f2fs] get_data_block_dio_write+0x50/0x60 [f2fs] do_blockdev_direct_IO+0xcd5/0x21e0 __blockdev_direct_IO+0x3a/0x3c f2fs_direct_IO+0x1ff/0x4a0 [f2fs] generic_file_direct_write+0xd9/0x160 __generic_file_write_iter+0xbb/0x1e0 f2fs_file_write_iter+0xaf/0x220 [f2fs] __vfs_write+0xd0/0x130 vfs_write+0xb2/0x1b0 SyS_pwrite64+0x69/0xa0 ? vtime_user_exit+0x29/0x70 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8 So this patch introduces a parameter map.m_may_create to indicate that f2fs_map_blocks() is called from write or read path, which can give the right hint to let f2fs_map_blocks() trigger OPU allocation and call f2fs_balanc_fs() correctly. BTW, it disables physical address preallocation for direct IO in f2fs_preallocate_blocks, which is redundant to OPU allocation of f2fs_map_blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: move dir data flush to write checkpoint processYunlei He1-0/+3
This patch move dir data flush to write checkpoint process, by doing this, it may reduce some time for dir fsync. pre: -f2fs_do_sync_file enter -file_write_and_wait_range <- flush & wait -write_checkpoint -do_checkpoint <- wait all -f2fs_do_sync_file exit now: -f2fs_do_sync_file enter -write_checkpoint -block_operations <- flush dir & no wait -do_checkpoint <- wait all -f2fs_do_sync_file exit Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: change segment to section in f2fs_ioc_gc_rangeYunlong Song1-1/+1
f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves blocks of section each time, so fix it from segment to section. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: introduce __is_large_section() for cleanupChao Yu1-1/+1
Introduce a wrapper __is_large_section() to clean up codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: clean up f2fs_sb_has_##feature_nameChao Yu1-7/+6
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to access .raw_super field, to avoid unneeded pointer conversion, this patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly. Just do cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22f2fs: fix to keep project quota consistentChao Yu1-9/+28
This patch does below changes to keep consistence of project quota data in sudden power-cut case: - update inode.i_projid and project quota atomically under lock_op() in f2fs_ioc_setproject() - recover inode.i_projid and project quota in recover_inode() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22f2fs: guarantee journalled quota data by checkpointChao Yu1-7/+24
For journalled quota mode, let checkpoint to flush dquot dirty data and quota file data to guarntee persistence of all quota sysfile in last checkpoint, by this way, we can avoid corrupting quota sysfile when encountering SPO. The implementation is as below: 1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is cached dquot metadata changes in quota subsystem, and later checkpoint should: a) flush dquot metadata into quota file. b) flush quota file to storage to keep file usage be consistent. 2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota operation failed due to -EIO or -ENOSPC, so later, a) checkpoint will skip syncing dquot metadata. b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a hint for fsck repairing. 3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota data updating is very heavy, it may cause hungtask in block_operation(). To avoid this, if our retry time exceed threshold, let's just skip flushing and retry in next checkpoint(). Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid warnings and set fsck flag] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: do not update REQ_TIME in case of error conditionsSahitya Tummala1-1/+1
The REQ_TIME should be updated only in case of success cases as followed at all other places in the file system. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: checkpoint disablingDaniel Rosenberg1-1/+11
Note that, it requires "f2fs: return correct errno in f2fs_gc". This adds a lightweight non-persistent snapshotting scheme to f2fs. To use, mount with the option checkpoint=disable, and to return to normal operation, remount with checkpoint=enable. If the filesystem is shut down before remounting with checkpoint=enable, it will revert back to its apparent state when it was first mounted with checkpoint=disable. This is useful for situations where you wish to be able to roll back the state of the disk in case of some critical failure. Signed-off-by: Daniel Rosenberg <drosen@google.com> [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30f2fs: allow out-place-update for direct IO in LFS modeChao Yu1-1/+2
Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't allow triggering any in-place-update writes, so we fallback direct write to buffered write, result in bad performance of large size write. This patch adds to support triggering out-place-update for direct IO to enhance its performance. Note that it needs to exclude direct read IO during direct write, since new data writing to new block address will no be valid until write finished. storage: zram time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync" Before: real 0m13.061s user 0m0.327s sys 0m12.486s After: real 0m6.448s user 0m0.228s sys 0m6.212s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30f2fs: refactor ->page_mkwrite() flowChao Yu1-23/+22
Thread A Thread B - f2fs_vm_page_mkwrite - f2fs_setattr - down_write(i_mmap_sem) - truncate_setsize - f2fs_truncate - up_write(i_mmap_sem) - f2fs_reserve_block reserve NEW_ADDR - skip dirty page due to truncation 1. we don't need to rserve new block address for a truncated page. 2. dn.data_blkaddr is used out of node page lock coverage. Refactor ->page_mkwrite() flow to fix above issues: - use __do_map_lock() to avoid racing checkpoint() - lock data page in prior to dnode page - cover f2fs_reserve_block with i_mmap_sem lock - wait page writeback before zeroing page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12f2fs: add SPDX license identifiersChao Yu1-4/+1
Remove the verbose license text from f2fs files and replace them with SPDX tags. This does not change the license of any of the code. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11f2fs: fix setattr project check upon fssetxattr ioctlWang Shilong1-23/+37
Currently, project quota could be changed by fssetxattr ioctl, and existed permission check inode_owner_or_capable() is obviously not enough, just think that common users could change project id of file, that could make users to break project quota easily. This patch try to follow same regular of xfs project quota: "Project Quota ID state is only allowed to change from within the init namespace. Enforce that restriction only if we are trying to change the quota ID state. Everything else is allowed in user namespaces." Besides that, check and set project id'state should be an atomic operation, protect whole operation with inode lock. Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05f2fs: avoid wrong decrypted data from diskJaegeuk Kim1-2/+1
1. Create a file in an encrypted directory 2. Do GC & drop caches 3. Read stale data before its bio for metapage was not issued yet Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05f2fs: fix to avoid NULL pointer dereference on se->discard_mapChao Yu1-1/+1
https://bugzilla.kernel.org/show_bug.cgi?id=200951 These is a NULL pointer dereference issue reported in bugzilla: Hi, in the setup there is a SATA SSD connected to a SATA-to-USB bridge. The disc is "Samsung SSD 850 PRO 256G" which supports TRIM. There are four partitions: sda1: FAT /boot sda2: F2FS / sda3: F2FS /home sda4: F2FS The bridge is ASMT1153e which uses the "uas" driver. There is no TRIM pass-through, so, when mounting it reports: mounting with "discard" option, but the device does not support discard The USB host is USB3.0 and UASP capable. It is the one on RK3399. Given this everything works fine, except there is no TRIM support. In order to enable TRIM a new UDEV rule is added [1]: /etc/udev/rules.d/10-sata-bridge-trim.rules: ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap" After reboot any F2FS write hangs forever and dmesg reports: Unable to handle kernel NULL pointer dereference Also tested on a x86_64 system: works fine even with TRIM enabled. same disc same bridge different usb host controller different cpu architecture not root filesystem Regards, Vicenç. [1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280 Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122 [000000000000003e] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1 Hardware name: Sapphire-RK3399 Board (DT) pstate: 00000005 (nzcv daif -PAN -UAO) pc : update_sit_entry+0x304/0x4b0 lr : update_sit_entry+0x108/0x4b0 sp : ffff00000ca13bd0 x29: ffff00000ca13bd0 x28: 000000000000003e x27: 0000000000000020 x26: 0000000000080000 x25: 0000000000000048 x24: ffff8000ebb85cf8 x23: 0000000000000253 x22: 00000000ffffffff x21: 00000000000535f2 x20: 00000000ffffffdf x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8 x17: 0000000007ce6926 x16: 000000001c83ffa8 x15: 0000000000000000 x14: ffff8000f602df90 x13: 0000000000000006 x12: 0000000000000040 x11: 0000000000000228 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 00000000000535f2 x6 : ffff8000ebff3440 x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8 x3 : 00000000ffffffff x2 : 0000000000000020 x1 : 0000000000000000 x0 : ffff8000eb9e5800 Process nvim (pid: 957, stack limit = 0x0000000063a78320) Call trace: update_sit_entry+0x304/0x4b0 f2fs_invalidate_blocks+0x98/0x140 truncate_node+0x90/0x400 f2fs_remove_inode_page+0xe8/0x340 f2fs_evict_inode+0x2b0/0x408 evict+0xe0/0x1e0 iput+0x160/0x260 do_unlinkat+0x214/0x298 __arm64_sys_unlinkat+0x3c/0x68 el0_svc_handler+0x94/0x118 el0_svc+0x8/0xc Code: f9400800 b9488400 36080140 f9400f01 (387c4820) ---[ end trace a0f21a307118c477 ]--- The reason is it is possible to enable discard flag on block queue via UDEV, but during mount, f2fs will initialize se->discard_map only if this flag is set, once the flag is set after mount, f2fs may dereference NULL pointer on se->discard_map. So this patch does below changes to fix this issue: - initialize and update se->discard_map all the time. - don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag during mount. - don't issue small discard on zoned block device. - introduce some functions to enhance the readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gcJaegeuk Kim1-57/+62
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop. If it hits the miximum retrials in GC, let's give a chance to release gc_mutex for a short time in order not to go into live lock in the worst case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14f2fs: rework fault injection handling to avoid a warningArnd Bergmann1-2/+1
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an unused label: fs/f2fs/segment.c: In function '__submit_discard_cmd': fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label] This could be fixed by adding another #ifdef around it, but the more reliable way of doing this seems to be to remove the other #ifdefs where that is easily possible. By defining time_to_inject() as a trivial stub, most of the checks for CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer formatting of the code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: fix avoid race between truncate and background GCChao Yu1-14/+23
Thread A Background GC - f2fs_setattr isize to 0 - truncate_setsize - gc_data_segment - f2fs_get_read_data_page page #0 - set_page_dirty - set_cold_data - f2fs_truncate - f2fs_setattr isize to 4k - read 4k <--- hit data in cached page #0 Above race condition can cause read out invalid data in a truncated page, fix it by i_gc_rwsem[WRITE] lock. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13f2fs: avoid race between zero_range and background GCChao Yu1-2/+9
Thread A Background GC - f2fs_zero_range - truncate_pagecache_range - gc_data_segment - get_read_data_page - move_data_page - set_page_dirty - set_cold_data - f2fs_do_zero_range - dn->data_blkaddr = NEW_ADDR; - f2fs_set_data_blkaddr Actually, we don't need to set dirty & checked flag on the page, since all valid data in the page should be zeroed by zero_range(). Use i_gc_rwsem[WRITE] to avoid such race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-10f2fs: fix to reset i_gc_failures correctlyChao Yu1-1/+1
Let's reset i_gc_failures to zero when we unset pinned state for file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>