aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/extent_io.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2011-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-1/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: cleanup error handling in inode.c Btrfs: put the right bio if we have an error Btrfs: free bitmaps properly when evicting the cache Btrfs: Free free_space item properly in btrfs_trim_block_group() btrfs: add missing spin_unlock to a rare exit path Btrfs: check return value of kmalloc() btrfs: fix wrong allocating flag when reading page Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()
2011-04-25btrfs: fix wrong allocating flag when reading pageItaru Kitayama1-1/+1
the space cache use extent_readpages() to read free space information, so we can not use GFP_KERNEL flag to allocate memory, or it may lead to deadlock. Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-20/+62
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: fix free space cache leak Btrfs: avoid taking the chunk_mutex in do_chunk_alloc Btrfs end_bio_extent_readpage should look for locked bits Btrfs: don't force chunk allocation in find_free_extent Btrfs: Check validity before setting an acl Btrfs: Fix incorrect inode nlink in btrfs_link() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr() Btrfs: make uncache_state unconditional btrfs: using cached extent_state in set/unlock combinations Btrfs: avoid taking the trans_mutex in btrfs_end_transaction Btrfs: fix subvolume mount by name problem when default mount subvolume is set fix user annotation in ioctl.c Btrfs: check for duplicate iov_base's when doing dio reads btrfs: properly handle overlapping areas in memmove_extent_buffer Btrfs: fix memory leaks in btrfs_new_inode() Btrfs: check for duplicate iov_base's when doing dio reads Btrfs: reuse the extent_map we found when calling btrfs_get_extent Btrfs: do not use async submit for small DIO io's Btrfs: don't split dio bios if we don't have to ...
2011-04-16Btrfs end_bio_extent_readpage should look for locked bitsChris Mason1-1/+1
A recent commit caches the extent state in end_bio_extent_readpage, but the search it does should look for locked extents. This fixes things to make it more effective. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12Btrfs: make uncache_state unconditionalChris Mason1-5/+3
The extent_io code can take cached pointers into the extent state trees, and these can make lookups much faster in common operations. The caching only happens when specific bits are set that prevent merging and splitting of the extent state. A help function was added to uncache the state, and it was testing the same set of conditionals. This can leak in very strange corner cases where the lock bit goes away unexpectedly. The uncaching should be unconditional. Once we have a ref on the extent we should always give it up. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-11btrfs: using cached extent_state in set/unlock combinationsArne Jansen1-17/+53
In several places the sequence (set_extent_uptodate, unlock_extent) is used. This leads to a duplicate lookup of the extent state. This patch lets set_extent_uptodate return a cached extent_state which can be passed to unlock_extent_cached. The occurences of the above sequences are updated to use the cache. Only end_bio_extent_readpage is updated that it first gets a cached state to pass it to the readpage_end_io_hook as the prototype requested and is later on being used for set/unlock. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-11btrfs: properly handle overlapping areas in memmove_extent_bufferSergei Trofimovich1-3/+11
Fix data corruption caused by memcpy() usage on overlapping data. I've observed it first when found out usermode linux crash on btrfs. ?all chain is the following: ------------[ cut here ]------------ WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 memcpy_extent_buffer+0x1a5/0x219() Call Trace: 6fa39a58: [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c 6fa39a68: [<60029ad9>] warn_slowpath_common+0x59/0x70 6fa39aa8: [<60029b05>] warn_slowpath_null+0x15/0x17 6fa39ab8: [<600efc97>] memcpy_extent_buffer+0x1a5/0x219 6fa39b48: [<600efd9f>] memmove_extent_buffer+0x94/0x208 6fa39bc8: [<600becbf>] btrfs_del_items+0x214/0x473 6fa39c78: [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda 6fa39cc8: [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d 6fa39d08: [<600d7864>] btrfs_start_transaction+0xe/0x10 6fa39d48: [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b 6fa39d78: [<600e04bc>] btrfs_unlink+0x70/0xef 6fa39dc8: [<6007f0d0>] vfs_unlink+0x58/0xa3 6fa39df8: [<60080278>] do_unlinkat+0xd4/0x162 6fa39e48: [<600517db>] call_rcu_sched+0xe/0x10 6fa39e58: [<600452a8>] __put_cred+0x58/0x5a 6fa39e78: [<6007446c>] sys_faccessat+0x154/0x166 6fa39ed8: [<60080317>] sys_unlink+0x11/0x13 6fa39ee8: [<60016b80>] handle_syscall+0x58/0x70 6fa39f08: [<60021377>] userspace+0x2d4/0x381 6fa39fc8: [<60014507>] fork_handler+0x62/0x69 ---[ end trace 70b0ca2ef0266b93 ]--- http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.html Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28Merge branch 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-0/+3
* 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (45 commits) Btrfs: fix __btrfs_map_block on 32 bit machines btrfs: fix possible deadlock by clearing __GFP_FS flag btrfs: check link counter overflow in link(2) btrfs: don't mess with i_nlink of unlocked inode in rename() Btrfs: check return value of btrfs_alloc_path() Btrfs: fix OOPS of empty filesystem after balance Btrfs: fix memory leak of empty filesystem after balance Btrfs: fix return value of setflags ioctl Btrfs: fix uncheck memory allocations btrfs: make inode ref log recovery faster Btrfs: add btrfs_trim_fs() to handle FITRIM Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Btrfs: make update_reserved_bytes() public btrfs: return EXDEV when linking from different subvolumes Btrfs: Per file/directory controls for COW and compression Btrfs: add datacow flag in inode flag btrfs: use GFP_NOFS instead of GFP_KERNEL Btrfs: check return value of read_tree_block() btrfs: properly access unaligned checksum buffer ... Fix up trivial conflicts in fs/btrfs/volumes.c due to plug removal in the block layer.
2011-03-28Btrfs: add initial tracepoint support for btrfsliubo1-0/+2
Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-24Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-1/+1
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
2011-03-17Btrfs: return error if the range we want to map is bogusJosef Bacik1-0/+1
Currently if we have corrupt metadata map_extent_buffer will complain about it, but not return an error so the caller has no idea a problem was hit. Fix this. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-6/+27
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe1-1/+1
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: kill off REQ_UNPLUGJens Axboe1-1/+1
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-08Btrfs: make sure not to return overlapping extents to fiemapChris Mason1-6/+27
The btrfs fiemap code was incorrectly returning duplicate or overlapping extents in some cases. cp was blindly trusting this result and we would end up with a destination file that was bigger than the original because some bytes were copied twice. The fix here adjusts our offsets to make sure we're always moving forward in the fiemap results. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-39/+99
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix fiemap bugs with delalloc Btrfs: set FMODE_EXCL in btrfs_device->mode Btrfs: make btrfs_rm_device() fail gracefully Btrfs: Avoid accessing unmapped kernel address Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs: put ENOSPC debugging under a mount option
2011-02-23Btrfs: fix fiemap bugs with delallocChris Mason1-39/+99
The Btrfs fiemap code wasn't properly returning delalloc extents, so applications that trust fiemap to decide if there are holes in the file see holes instead of delalloc. This reworks the btrfs fiemap code, adding a get_extent helper that searches for delalloc ranges and also adding a helper for extent_fiemap that skips past holes in the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-4/+44
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check return value of alloc_extent_map() Btrfs - Fix memory leak in btrfs_init_new_device() btrfs: prevent heap corruption in btrfs_ioctl_space_info() Btrfs: Fix balance panic Btrfs: don't release pages when we can't clear the uptodate bits Btrfs: fix page->private races
2011-02-14Btrfs: don't release pages when we can't clear the uptodate bitsChris Mason1-1/+9
Btrfs tracks uptodate state in an rbtree as well as in the page bits. This is supposed to enable us to use block sizes other than the page size, but there are a few parts still missing before that completely works. But, our readpage routine trusts this additional range based tracking of uptodateness, much in the same way the buffer head up to date bits are trusted for the other filesystems. The problem is that sometimes we need to allocate memory in order to split records in the rbtree, even when we are just clearing bits. This can be difficult when our clearing function is called GFP_ATOMIC, which can happen in the releasepage path. So, what happens today looks like this: releasepage called with GFP_ATOMIC btrfs_releasepage calls clear_extent_bit clear_extent_bit fails to allocate ram, leaving the up to date bit set btrfs_releasepage returns success The end result is the page being gone, but btrfs thinking the range is up to date. Later on if someone tries to read that same page, the btrfs readpage code will return immediately thinking the page is already up to date. This commit fixes things to fail the releasepage when we can't clear the extent state bits. It covers both data pages and metadata tree blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14Btrfs: fix page->private racesChris Mason1-3/+35
There is a race where btrfs_releasepage can drop the page->private contents just as alloc_extent_buffer is setting up pages for metadata. Because of how the Btrfs page flags work, this results in us skipping the crc on the page during IO. This patch sovles the race by waiting until after the extent buffer is inserted into the radix tree before it sets page private. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-2/+4
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits) Btrfs: Fix page count calculation btrfs: Drop __exit attribute on btrfs_exit_compress btrfs: cleanup error handling in btrfs_unlink_inode() Btrfs: exclude super blocks when we read in block groups Btrfs: make sure search_bitmap finds something in remove_from_bitmap btrfs: fix return value check of btrfs_start_transaction() btrfs: checking NULL or not in some functions Btrfs: avoid uninit variable warnings in ordered-data.c Btrfs: catch errors from btrfs_sync_log Btrfs: make shrink_delalloc a little friendlier Btrfs: handle no memory properly in prepare_pages Btrfs: do error checking in btrfs_del_csums Btrfs: use the global block reserve if we cannot reserve space Btrfs: do not release more reserved bytes to the global_block_rsv than we need Btrfs: fix check_path_shared so it returns the right value btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs: fix return value check of btrfs_join_transaction() fs/btrfs/inode.c: Add missing IS_ERR test btrfs: fix missing break in switch phrase btrfs: fix several uncheck memory allocations ...
2011-02-01btrfs: checking NULL or not in some functionsTsutomu Itoh1-0/+2
Because NULL is returned when the memory allocation fails, it is checked whether it is NULL. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-28btrfs: fix uncheck memory allocation in btrfs_submit_compressed_readliubo1-2/+2
btrfs_submit_compressed_read() is lack of memory allocation checks and corresponding error route. After this fix, if it comes to "no memory" case, errno will be returned to userland step by step, and tell users this operation cannot go on. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-17Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-1/+6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...
2011-01-16btrfs: check NULL or notTsutomu Itoh1-0/+2
Should check if functions returns NULL or not. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-22btrfs: Allow to add new compression algorithmLi Zefan1-1/+4
Make the code aware of compression type, instead of always assuming zlib compression. Also make the zlib workspace function as common code for all compression types. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2010-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-15/+62
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: don't use migrate page without CONFIG_MIGRATION Btrfs: deal with DIO bios that span more than one ordered extent Btrfs: setup blank root and fs_info for mount time Btrfs: fix fiemap Btrfs - fix race between btrfs_get_sb() and umount Btrfs: update inode ctime when using links Btrfs: make sure new inode size is ok in fallocate Btrfs: fix typo in fallocate to make it honor actual size Btrfs: avoid NULL pointer deref in try_release_extent_buffer Btrfs: make btrfs_add_nondir take parent inode as an argument Btrfs: hold i_mutex when calling btrfs_log_dentry_safe Btrfs: use dget_parent where we can UPDATED Btrfs: fix more ESTALE problems with NFS Btrfs: handle NFS lookups properly btrfs: make 1-bit signed fileds unsigned btrfs: Show device attr correctly for symlinks btrfs: Set file size correctly in file clone btrfs: Check if dest_offset is block-size aligned before cloning file Btrfs: handle the space_cache option properly btrfs: Fix early enospc because 'unused' calculated with wrong sign. ...
2010-11-27Btrfs: fix fiemapJosef Bacik1-9/+54
There are two big problems currently with FIEMAP 1) We return extents for holes. This isn't supposed to happen, we just don't return extents for holes and then userspace interprets the lack of an extent as a hole. 2) We sometimes don't set FIEMAP_EXTENT_LAST properly. This is because we wait to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask fiemap to map up to the last extent in a file, and there is nothing but holes up to the i_size. To fix this we need to lookup the last extent in this file and save the logical offset, so if we happen to try and map that extent we can be sure to set FIEMAP_EXTENT_LAST. With this patch we now pass xfstest 225, which we never have before. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21Btrfs: avoid NULL pointer deref in try_release_extent_bufferChris Mason1-2/+4
If we fail to find a pointer in the radix tree, don't try to deref the NULL one we do have. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21btrfs: cleanup duplicate bio allocating functionsMiao Xie1-4/+4
extent_bio_alloc() and compressed_bio_alloc() are similar, cleanup similar source code. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-30Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-85/+83
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (39 commits) Btrfs: deal with errors from updating the tree log Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed Btrfs: make SNAP_DESTROY async Btrfs: add SNAP_CREATE_ASYNC ioctl Btrfs: add START_SYNC, WAIT_SYNC ioctls Btrfs: async transaction commit Btrfs: fix deadlock in btrfs_commit_transaction Btrfs: fix lockdep warning on clone ioctl Btrfs: fix clone ioctl where range is adjacent to extent Btrfs: fix delalloc checks in clone ioctl Btrfs: drop unused variable in block_alloc_rsv Btrfs: cleanup warnings from gcc 4.6 (nonbugs) Btrfs: Fix variables set but not read (bugs found by gcc 4.6) Btrfs: Use ERR_CAST helpers Btrfs: use memdup_user helpers Btrfs: fix raid code for removing missing drives Btrfs: Switch the extent buffer rbtree into a radix tree Btrfs: restructure try_release_extent_buffer() Btrfs: use the flusher threads for delalloc throttling Btrfs: tune the chunk allocation to 5% of the FS as metadata ... Fix up trivial conflicts in fs/btrfs/super.c and fs/fs-writeback.c, and remove use of INIT_RCU_HEAD in fs/btrfs/extent_io.c (that init macro was useless and removed in commit 5e8067adfdba: "rcu head remove init")
2010-10-29Btrfs: cleanup warnings from gcc 4.6 (nonbugs)Andi Kleen1-9/+0
These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Fix variables set but not read (bugs found by gcc 4.6)Andi Kleen1-0/+2
These are all the cases where a variable is set, but not read which are really bugs. - Couple of incorrect error handling fixed. - One incorrect use of a allocation policy - Some other things Still needs more review. Found by gcc 4.6's new warnings. [akpm@linux-foundation.org: fix build. Might have been bitrot] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Switch the extent buffer rbtree into a radix treeMiao Xie1-67/+47
This patch reduces the CPU time spent in the extent buffer search by using the radix tree instead of the rbtree and using the rcu lock instead of the spin lock. I did a quick test by the benchmark tool[1] and found the patch improve the file creation/deletion performance problem that I have reported[2]. Before applying this patch: Create files: Total files: 50000 Total time: 0.971531 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.366761 Average time: 0.000027 After applying this patch: Create files: Total files: 50000 Total time: 0.927455 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.292280 Average time: 0.000026 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 [2] http://marc.info/?l=linux-btrfs&m=128212635122920&w=2 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: restructure try_release_extent_buffer()Miao Xie1-11/+37
restructure try_release_extent_buffer() and write a function to release the extent buffer. It will be used later. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-06writeback: remove writeback_inodes_wbcChristoph Hellwig1-2/+0
This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-05-25Btrfs: rework O_DIRECT enospc handlingChris Mason1-5/+4
This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: use async helpers for DIO write checksummingChris Mason1-1/+1
The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: do aio_write instead of writeJosef Bacik1-1/+10
In order for AIO to work, we need to implement aio_write. This patch converts our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and nothing broke, and the AIO stuff magically started working. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Update metadata reservation for delayed allocationYan, Zheng1-33/+30
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-14/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: add check for changed leaves in setup_leaf_for_split Btrfs: create snapshot references in same commit as snapshot Btrfs: fix small race with delalloc flushing waitqueue's Btrfs: use add_to_page_cache_lru, use __page_cache_alloc Btrfs: fix chunk allocate size calculation Btrfs: kill max_extent mount option Btrfs: fail to mount if we have problems reading the block groups Btrfs: check btrfs_get_extent return for IS_ERR() Btrfs: handle kmalloc() failure in inode lookup ioctl Btrfs: dereferencing freed memory Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() Btrfs: Remove unnecessary finish_wait() in wait_current_trans() Btrfs: add NULL check for do_walk_down() Btrfs: remove duplicate include in ioctl.c Fix trivial conflict in fs/btrfs/compression.c due to slab.h include cleanups.
2010-04-05Btrfs: use add_to_page_cache_lru, use __page_cache_allocNick Piggin1-14/+1
Pagecache pages should be allocated with __page_cache_alloc, so they obey pagecache memory policies. add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions. Signed-off-by: Nick Piggin <npiggin@suse.de>: Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.hTejun Heo1-1/+0
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-15Btrfs: cache the extent state everywhere we possibly can V2Josef Bacik1-23/+38
This patch just goes through and fixes everybody that does lock_extent() blah unlock_extent() to use lock_extent_bits() blah unlock_extent_cached() and pass around a extent_state so we only have to do the searches once per function. This gives me about a 3 mb/s boots on my random write test. I have not converted some things, like the relocation and ioctl's, since they aren't heavily used and the relocation stuff is in the middle of being re-written. I also changed the clear_extent_bit() to only unset the cached state if we are clearing EXTENT_LOCKED and related stuff, so we can do things like this lock_extent_bits() clear delalloc bits unlock_extent_cached() without losing our cached state. I tested this thoroughly and turned on LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: cache extent state in find_delalloc_rangeJosef Bacik1-3/+8
This patch makes us cache the extent state we find in find_delalloc_range since we'll have to lock the extent later on in the function. This will keep us from re-searching for the rang when we try to lock the extent. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: finish read pages in the order they are submittedChris Mason1-3/+4
The endio is done at reverse order of bio vectors. That means for a sequential read, the page first submitted will finish last in a bio. Considering we will do checksum (making cache hot) for every page, this does introduce delay (and chance to squeeze cache used soon) for pages submitted at the begining. I don't observe obvious performance difference with below patch at my simple test, but seems more natural to finish read in the order they are submitted. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-08Btrfs: use RB_ROOT to intialize rb_trees instead of setting rb_node to NULLEric Paris1-2/+2
btrfs inialize rb trees in quite a number of places by settin rb_node = NULL; The problem with this is that 17d9ddc72fb8bba0d4f678 in the linux-next tree adds a new field to that struct which needs to be NULL for the new rbtree library code to work properly. This patch uses RB_ROOT as the intializer so all of the relevant fields will be NULL'd. Without the patch I get a panic. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04Btrfs: fix race between allocate and release extent buffer.Yan, Zheng1-2/+1
Increase extent buffer's reference count while holding the lock. Otherwise it can race with try_release_extent_buffer. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-08Btrfs: release delalloc reservations on extent item insertionJosef Bacik1-6/+13
This patch fixes an issue with the delalloc metadata space reservation code. The problem is we used to free the reservation as soon as we allocated the delalloc region. The problem with this is if we are not inserting an inline extent, we don't actually insert the extent item until after the ordered extent is written out. This patch does 3 things, 1) It moves the reservation clearing stuff into the ordered code, so when we remove the ordered extent we remove the reservation. 2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear delalloc bits in the cases where we want to clear the metadata reservation when we clear the delalloc extent, in the case that we do an inline extent or we invalidate the page. 3) It adds another waitqueue to the space info so that when we start a fs wide delalloc flush, anybody else who also hits that area will simply wait for the flush to finish and then try to make their allocation. This has been tested thoroughly to make sure we did not regress on performance. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-08Btrfs: cleanup extent_clear_unlock_delalloc flagsChris Mason1-16/+11
extent_clear_unlock_delalloc has a growing set of ugly parameters that is very difficult to read and maintain. This switches to a flag field and well named flag defines. Signed-off-by: Chris Mason <chris.mason@oracle.com>