aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ext4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2011-09-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-3/+4
* 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
2011-08-31ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complainingJiaying Zhang3-4/+18
The i_mutex lock and flush_completed_IO() added by commit 2581fdc810 in ext4_evict_inode() causes lockdep complaining about potential deadlock in several places. In most/all of these LOCKDEP complaints it looks like it's a false positive, since many of the potential circular locking cases can't take place by the time the ext4_evict_inode() is called; but since at the very least it may mask real problems, we need to address this. This change removes the flush_completed_IO() and i_mutex lock in ext4_evict_inode(). Instead, we take a different approach to resolve the software lockup that commit 2581fdc810 intends to fix. Rather than having ext4-dio-unwritten thread wait for grabing the i_mutex lock of an inode, we use mutex_trylock() instead, and simply requeue the work item if we fail to grab the inode's i_mutex lock. This should speed up work queue processing in general and also prevents the following deadlock scenario: During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-23block: separate priority boosting from REQ_METAChristoph Hellwig2-3/+4
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23block: remove READ_META and WRITE_METAChristoph Hellwig2-3/+3
Replace all occurnanced of the undocumented READ_META with READ | REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-21Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds5-9/+37
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ext4: Fix ext4_should_writeback_data() for no-journal mode
2011-08-19ext4: flush any pending end_io requests before DIO reads w/dioread_nolockJiaying Zhang1-2/+7
There is a race between ext4 buffer write and direct_IO read with dioread_nolock mount option enabled. The problem is that we clear PageWriteback flag during end_io time but will do uninitialized-to-initialized extent conversion later with dioread_nolock. If an O_direct read request comes in during this period, ext4 will return zero instead of the recently written data. This patch checks whether there are any pending uninitialized-to-initialized extent conversion requests before doing O_direct read to close the race. Note that this is just a bandaid fix. The fundamental issue is that we clear PageWriteback flag before we really complete an IO, which is problem-prone. To fix the fundamental issue, we may need to implement an extent tree cache that we can use to look up pending to-be-converted extents. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: fix nomblk_io_submit option so it correctly converts uninit blocksTheodore Ts'o1-1/+6
Bug discovered by Jan Kara: Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back the old IO submission code but apparently it forgot to return the old handling of uninitialized buffers so we unconditionnaly call block_write_full_page() without specifying end_io function. So AFAICS we never convert unwritten extents to written in some cases. For example when I mount the fs as: mount -t ext4 -o nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600); char buf[1024]; memset(buf, 'a', sizeof(buf)); fallocate(fd, 0, 0, 16384); write(fd, buf, sizeof(buf)); I get a file full of zeros (after remounting the filesystem so that pagecache is dropped) instead of seeing the first KB contain 'a's. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.Tao Ma2-3/+12
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We don't increase i_aiodio_unwritten when setting EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process to wait forever. Part of the patch came from Eric in his e-mail, but it doesn't fix the problem met by Michael actually. http://marc.info/?l=linux-ext4&m=131316851417460&w=2 Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inodeJiaying Zhang2-1/+6
Flush inode's i_completed_io_list before calling ext4_io_wait to prevent the following deadlock scenario: A page fault happens while some process is writing inode A. During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Also moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion, ext4_evict_inode() is called before ext4_destroy_inode() and in ext4_evict_inode(), we may call ext4_truncate() without holding i_mutex lock. As a result, there is a race between flush_completed_IO that is called from ext4_ext_truncate() and ext4_end_io_work, which may cause corruption on an io_end structure. This change moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode() to resolve the race between ext4_truncate() and ext4_end_io_work during inode deletion. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: Fix ext4_should_writeback_data() for no-journal modeCurt Wohlgemuth2-2/+6
ext4_should_writeback_data() had an incorrect sequence of tests to determine if it should return 0 or 1: in particular, even in no-journal mode, 0 was being returned for a non-regular-file inode. This meant that, in non-journal mode, we would use ext4_journalled_aops for directories, symlinks, and other non-regular files. However, calling journalled aop callbacks when there is no valid handle, can cause problems. This would cause a kernel crash with Jan Kara's commit 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data"), because we now dereference 'handle' in ext4_journalled_write_end(). I also added BUG_ONs to check for a valid handle in the obviously journal-only aops callbacks. I tested this running xfstests with a scratch device in these modes: - no-journal - data=ordered - data=writeback - data=journal All work fine; the data=journal run has many failures and a crash in xfstests 074, but this is no different from a vanilla kernel. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-11ext4: Properly count journal credits for long symlinksEric Sandeen1-2/+4
Commit df5e6223407e ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-03ext4: use kzalloc in ext4_kzalloc()Mathias Krause1-1/+1
Commit 9933fc0i (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced a typo for ext4_kzalloc() by not using kzalloc() but kmalloc(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds17-1873/+2088
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: prevent memory leaks from ext4_mb_init_backend() on error path ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode # ext4: use ext4_msg() instead of printk in mballoc ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() ext4: use the correct error exit path in ext4_init_inode_table() ext4: add missing kfree() on error return path in add_new_gdb() ext4: change umode_t in tracepoint headers to be an explicit __u16 ext4: fix races in ext4_sync_parent() ext4: Fix overflow caused by missing cast in ext4_fallocate() ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole ext4: simplify parameters of reserve_backup_gdb() ext4: simplify parameters of add_new_gdb() ext4: remove lock_buffer in bclean() and setup_new_group_blocks() ext4: simplify journal handling in setup_new_group_blocks() ext4: let setup_new_group_blocks() set multiple bits at a time ext4: fix a typo in ext4_group_extend() ext4: let ext4_group_add_blocks() handle 0 blocks quickly ext4: let ext4_group_add_blocks() return an error code ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() ... Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO() function for the new simplified calling convention, while commit dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to indirect.c") moved the function to another file.
2011-08-01ext4: prevent memory leaks from ext4_mb_init_backend() on error pathYu Jian1-6/+6
In ext4_mb_init(), if the s_locality_group allocation fails it will currently cause the allocations made in ext4_mb_init_backend() to be leaked. Moving the ext4_mb_init_backend() allocation after the s_locality_group allocation avoids that problem. Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #Yu Jian1-1/+5
Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01ext4: use ext4_msg() instead of printk in mballocTheodore Ts'o1-37/+42
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_infoTheodore Ts'o3-13/+15
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()Theodore Ts'o2-18/+39
Introduce new helper functions which try kmalloc, and then fall back to vmalloc if necessary, and use them for allocating and deallocating s_flex_groups. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01ext4: use the correct error exit path in ext4_init_inode_table()Yongqiang Yang1-1/+1
This patch lets ext4_init_inode_table() handle errors right. ext4_init_inode_table() should down_write() alloc_sem which has been up_write()ed and stop the started journal handle. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01switch posix_acl_equiv_mode() to umode_t *Al Viro1-3/+1
... so that &inode->i_mode could be passed to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01switch posix_acl_create() to umode_t *Al Viro1-4/+1
so we can pass &inode->i_mode to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-30ext4: add missing kfree() on error return path in add_new_gdb()Dan Carpenter1-0/+1
We added some more error handling in b40971426a "ext4: add error checking to calls to ext4_handle_dirty_metadata()". But we need to call kfree() as well to avoid a memory leak. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-30ext4: fix races in ext4_sync_parent()Theodore Ts'o1-5/+21
Fix problems if fsync() races against a rename of a parent directory as pointed out by Al Viro in his own inimitable way: >While we are at it, could somebody please explain what the hell is ext4 >doing in >static int ext4_sync_parent(struct inode *inode) >{ > struct writeback_control wbc; > struct dentry *dentry = NULL; > int ret = 0; > > while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) { > ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY); > dentry = list_entry(inode->i_dentry.next, > struct dentry, d_alias); > if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode) > break; > inode = dentry->d_parent->d_inode; > ret = sync_mapping_buffers(inode->i_mapping); > ... >Note that dentry obviously can't be NULL there. dentry->d_parent is never >NULL. And dentry->d_parent would better not be negative, for crying out >loud! What's worse, there's no guarantees that dentry->d_parent will >remain our parent over that sync_mapping_buffers() *and* that inode won't >just be freed under us (after rename() and memory pressure leading to >eviction of what used to be our dentry->d_parent)...... Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-27ext4: Fix overflow caused by missing cast in ext4_fallocate()Utako Kusaka1-1/+1
The logical block number in map.l_blk is a __u32, and so before we shift it left, by the block size, we neeed cast it to a 64-bit size. Otherwise i_size can be corrupted on an ENOSPC. # df -T /mnt/mp1 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sda6 ext4 9843276 153056 9190200 2% /mnt/mp1 # fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device # stat /mnt/mp1/testfile File: `/mnt/mp1/testfile' Size: 4293656576 Blocks: 19380440 IO Block: 4096 regular file Device: 806h/2054d Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2011-07-25 13:01:31.414490496 +0900 Modify: 2011-07-25 13:01:31.414490496 +0900 Change: 2011-07-25 13:01:31.454490495 +0900 Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> -- fs/ext4/extents.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
2011-07-27ext4: add action of moving index in ext4_ext_rm_idx for Punch HoleRobin Dong1-2/+7
The old function ext4_ext_rm_idx is used only for truncate case because it just remove last index in extent-index-block. When punching hole, it usually needed to remove "middle" index, therefore we must move indexes which after it forward. (I create a file with 1 depth extent tree and punch hole in the middle of it, the last index in index-block strangly gone, so I find out this bug) Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-27ext4: simplify parameters of reserve_backup_gdb()Yongqiang Yang1-5/+7
The reserve_backup_gdb() function only needs the block group number; there's no need to pass a pointer to struct ext4_new_group_data to it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
2011-07-27ext4: simplify parameters of add_new_gdb()Yongqiang Yang1-15/+24
add_new_gdb() only needs the block group number; there is no need to pass a pointer to struct ext4_new_group_data to add_new_gdb(). Instead of filling in a pointer the struct buffer_head in add_new_gdb(), it's simpler to have the caller fetch it from the s_group_desc[] array. [Fixed error path to handle the case where struct buffer_head *primary hasn't been set yet. -- Ted] Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-27ext4: remove lock_buffer in bclean() and setup_new_group_blocks()Yongqiang Yang1-4/+0
There is no need to lock the buffers since no one else should be touching these buffers besides the file system. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: simplify journal handling in setup_new_group_blocks()Yongqiang Yang1-19/+20
This patch simplifies journal handling in setup_new_group_blocks(). In previous code, block bitmap is modified everywhere in setup_new_group_blocks(), ext4_get_write_access() in extend_or_restart_transaction() is used to guarantee that the block bitmap stays in the new handle, this makes things complicated. The previous commit changed things so that the modifications on the block bitmap are batched and done by ext4_set_bits() at the end of the for loop. This allows us to simplify things. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: let setup_new_group_blocks() set multiple bits at a timeYongqiang Yang3-18/+17
Rename mb_set_bits() to ext4_set_bits() and make it a global function so that setup_new_group_blocks() can use it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: fix a typo in ext4_group_extend()Yongqiang Yang1-1/+1
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: let ext4_group_add_blocks() handle 0 blocks quicklyYongqiang Yang1-0/+3
If ext4_group_add_blocks() is called with 0 block, make it return 0 without doing any extra work. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: let ext4_group_add_blocks() return an error codeYongqiang Yang3-9/+23
This patch lets ext4_group_add_blocks() return an error code if it fails, so that upper functions can handle error correctly. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()Yongqiang Yang3-4/+4
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: prevent a fs with errors from being resizedYongqiang Yang1-0/+10
A filesystem with errors is not allowed to being resized, otherwise, it is easy to destroy the filesystem. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26ext4: prevent parallel resizers by atomic bit opsYongqiang Yang4-40/+36
Before this patch, parallel resizers are allowed and protected by a mutex lock, actually, there is no need to support parallel resizer, so this patch prevents parallel resizers by atmoic bit ops, like lock_page() and unlock_page() do. To do this, the patch removed the mutex lock s_resize_lock from struct ext4_sb_info and added a unsigned long field named s_resize_flags which inidicates if there is a resizer. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-26Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writebackLinus Torvalds1-2/+2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits) mm: properly reflect task dirty limits in dirty_exceeded logic writeback: don't busy retry writeback on new/freeing inodes writeback: scale IO chunk size up to half device bandwidth writeback: trace global_dirty_state writeback: introduce max-pause and pass-good dirty limits writeback: introduce smoothed global dirty limit writeback: consolidate variable names in balance_dirty_pages() writeback: show bdi write bandwidth in debugfs writeback: bdi write bandwidth estimation writeback: account per-bdi accumulated written pages writeback: make writeback_control.nr_to_write straight writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr() writeback: trace event writeback_queue_io writeback: trace event writeback_single_inode writeback: remove .nonblocking and .encountered_congestion writeback: remove writeback_control.more_io writeback: skip balance_dirty_pages() for in-memory fs writeback: add bdi_dirty_limit() kernel-doc writeback: avoid extra sync work at enqueue time writeback: elevate queue_io() into wb_writeback() ... Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c
2011-07-26ext4: fix data corruption in inodes with journalled dataJan Kara1-0/+29
When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext4_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-25fs: take the ACL checks to common codeChristoph Hellwig4-23/+6
Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25kill boilerplates around posix_acl_create_masq()Al Viro1-17/+9
new helper: posix_acl_create(&acl, gfp, mode_p). Replaces acl with modified clone, on failure releases acl and replaces with NULL. Returns 0 or -ve on error. All callers of posix_acl_create_masq() switched. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25kill boilerplate around posix_acl_chmod_masq()Al Viro1-24/+20
new helper: posix_acl_chmod(&acl, gfp, mode). Replaces acl with modified clone or with NULL if that has failed; returns 0 or -ve on error. All callers of posix_acl_chmod_masq() switched to that - they'd been doing exactly the same thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25vfs: move ACL cache lookup into generic codeLinus Torvalds1-6/+0
This moves logic for checking the cached ACL values from low-level filesystems into generic code. The end result is a streamlined ACL check that doesn't need to load the inode->i_op->check_acl pointer at all for the common cached case. The filesystems also don't need to check for a non-blocking RCU walk case in their acl_check() functions, because that is all handled at a VFS layer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-23ext4: correct comment for ext4_ext_check_cacheRobin Dong1-1/+1
The comment for ext4_ext_check_cache has a litte mistake. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23ext4: correct the debug message in ext4_ext_insert_extentRobin Dong1-1/+1
The debug message in ext4_ext_insert_extent before moving extent is incorrect (the "from xx to xx"). Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23ext4: remove unused argument in ext4_ext_next_leaf_blockRobin Dong1-3/+2
The argument "inode" in function ext4_ext_next_allocated_block looks useless, so clean it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23ext4: remove ac_repeats from ext4_allocation_contextTao Ma1-1/+0
ac_repeats isn't referenced in the mballoc code. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23ext4: don't increment s_mb_buddies_generated in ext4_mb_releaseTao Ma1-1/+1
In ext4_mb_release, we use s_mb_buddies_generated++. Although the output is OK, but I don't think we need this extra ++. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddyTao Ma1-1/+1
ext4_mb_load_buddy() calls ext4_get_group_info() for setting both "grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp". Reported-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-20fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik2-4/+36
Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20Ext4: handle SEEK_HOLE/SEEK_DATA genericallyJosef Bacik1-0/+21
Since Ext4 has its own lseek we need to make sure it handles SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic case, somebody else can come along and make it do fancy things later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>