linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-07-25	vfs: move ACL cache lookup into generic code	Linus Torvalds	1	-6/+2
	This moves logic for checking the cached ACL values from low-level filesystems into generic code. The end result is a streamlined ACL check that doesn't need to load the inode->i_op->check_acl pointer at all for the common cached case. The filesystems also don't need to check for a non-blocking RCU walk case in their acl_check() functions, because that is all handled at a VFS layer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25	xfs: Fix wrong return value of xfs_file_aio_write	Markus Trippelsdorf	1	-1/+4
	The fsync prototype change commit 02c24a82187d accidentally overwrote the ssize_t return value of xfs_file_aio_write with 0 for SYNC type writes. Fix this by checking if an error occured when calling xfs_file_fsync and only change the return value in this case. In addition xfs_file_fsync actually returns a normal negative error, so fix this, too. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	Linus Torvalds	6	-68/+59
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.
2011-07-20	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers	Josef Bacik	1	-9/+8
	Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20	fs: move inode_dio_done to the end_io handler	Christoph Hellwig	1	-0/+3
	For filesystems that delay their end_io processing we should keep our i_dio_count until the the processing is done. Enable this by moving the inode_dio_done call to the end_io handler if one exist. Note that the actual move to the workqueue for ext4 and XFS is not done in this patch yet, but left to the filesystem maintainers. At least for XFS it's not needed yet either as XFS has an internal equivalent to i_dio_count. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20	xfs: make use of new shrinker callout for the inode cache	Dave Chinner	3	-56/+46
	Convert the inode reclaim shrinker to use the new per-sb shrinker operations. This allows much bigger reclaim batches to be used, and allows the XFS inode cache to be shrunk in proportion with the VFS dentry and inode caches. This avoids the problem of the VFS caches being shrunk significantly before the XFS inode cache is shrunk resulting in imbalances in the caches during reclaim. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20	xfs: add size update tracepoint to IO completion	Dave Chinner	2	-4/+9
	For improving insight into IO completion behaviour. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-20	xfs: failure mapping nfs fh to inode should return ESTALE	J. Bruce Fields	1	-2/+2
	On xfs exports, nfsd is incorrectly returning ENOENT instead of ESTALE on attempts to use a filehandle of a deleted file (spotted with pynfs test PUTFH3). The ENOENT was coming from xfs_iget. (It's tempting to wonder whether we should just map all xfs_iget errors to ESTALE, but I don't believe so--xfs_iget can also return ENOMEM at least, which we wouldn't want mapped to ESTALE.) While we're at it, the other return of ENOENT in xfs_nfs_get_inode() also looks wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-20	xfs: Remove the second parameter to xfs_sb_count()	Chandra Seetharaman	1	-1/+1
	Remove the second parameter to xfs_sb_count() since all callers of the function set them. Also, fix the header comment regarding it being called periodically. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-20	->permission() sanitizing: don't pass flags to ->check_acl()	Al Viro	1	-1/+1
	not used in the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20	->permission() sanitizing: pass MAY_NOT_BLOCK to ->check_acl()	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-13	xfs: remove leftovers of the old btree tracing code	Christoph Hellwig	1	-1/+0
	Remove various bits left over from the old kdb-only btree tracing code, but leave the actual trace point stubs in place to ease adding new event based btree tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-13	xfs: remove the unused xfs_buf_delwri_sort function	Christoph Hellwig	1	-8/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-13	xfs: remove wrappers around b_iodone	Christoph Hellwig	2	-5/+1
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-13	xfs: remove wrappers around b_fspriv	Christoph Hellwig	1	-2/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-13	xfs: add a proper transaction pointer to struct xfs_buf	Christoph Hellwig	1	-3/+1
	Replace the typeless b_fspriv2 and the ugly macros around it with a properly typed transaction pointer. As a fallout the log buffer state debug checks are also removed. We could have kept them using casts, but as they do not have a real purpose we can as well just remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-13	xfs: start periodic workers later	Christoph Hellwig	1	-21/+14
	Start the periodic sync workers only after we have finished xfs_mountfs and thus fully set up the filesystem structures. Without this we can call into xfs_qm_sync before the quotainfo strucute is set up if the mount takes unusually long, and probably hit other incomplete states as well. Also clean up the xfs_fs_fill_super error path by using consistent label names, and removing an impossible to reach case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-07-08	xfs: cleanup I/O-related buffer flags	Christoph Hellwig	2	-40/+35
	Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and XBF_FLUSH to allow more fine grained control over the bio flags. Also cleanup processing of the flags in _xfs_buf_ioapply to make more sense, and renumber the sparse flag number space to group flags by purpose. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: return the buffer locked from xfs_buf_get_uncached	Christoph Hellwig	1	-3/+0
	All other xfs_buf_get/read-like helpers return the buffer locked, make sure xfs_buf_get_uncached isn't different for no reason. Half of the callers already lock it directly after, and the others probably should also keep it locked if only for consistency and beeing able to use xfs_buf_rele, but I'll leave that for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: clean up buffer locking helpers	Christoph Hellwig	3	-31/+19
	Rename xfs_buf_cond_lock and reverse it's return value to fit most other trylock operations in the Kernel and XFS (with the exception of down_trylock, after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val with an xfs_buf_islocked for use in asserts, or and opencoded variant in tracing. remove the XFS_BUF_* wrappers for all the locking helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: remove the unused xfs_bufhash structure	Christoph Hellwig	1	-5/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: use generic get_unaligned_beXX helpers	Christoph Hellwig	1	-1/+6
	Switch the shortform directory code over to use the generic get_unaligned_beXX helpers instead of reinventing them. As a result kill off xfs_arch.h and move the setting of XFS_NATIVE_HOST into xfs_linux.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: kill the unused struct xfs_sync_work	Christoph Hellwig	1	-8/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: improve sync behaviour in the face of aggressive dirtying	Christoph Hellwig	1	-5/+3
	The following script from Wu Fengguang shows very bad behaviour in XFS when aggressively dirtying data during a sync on XFS, with sync times up to almost 10 times as long as ext4. A large part of the issue is that XFS writes data out itself two times in the ->sync_fs method, overriding the livelock protection in the core writeback code, and another issue is the lock-less xfs_ioend_wait call, which doesn't prevent new ioend from being queue up while waiting for the count to reach zero. This patch removes the XFS-internal sync calls and relies on the VFS to do it's work just like all other filesystems do. Note that the i_iocount wait which is rather suboptimal is simply removed here. We already do it in ->write_inode, which keeps the current supoptimal behaviour. We'll eventually need to remove that as well, but that's material for a separate commit. ------------------------------ snip ------------------------------ #!/bin/sh umount /dev/sda7 mkfs.xfs -f /dev/sda7 # mkfs.ext4 /dev/sda7 # mkfs.btrfs /dev/sda7 mount /dev/sda7 /fs echo $((50<<20)) > /proc/sys/vm/dirty_bytes pid= for i in `seq 10` do dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 & pid="$pid $!" done sleep 1 tic=$(date +'%s') sync tac=$(date +'%s') echo echo sync time: $((tac-tic)) egrep '(Dirty\|Writeback\|NFS_Unstable)' /proc/meminfo pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; } ------------------------------ snip ------------------------------ Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: split xfs_itruncate_finish	Christoph Hellwig	2	-11/+3
	Split the guts of xfs_itruncate_finish that loop over the existing extents and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs. Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish, which allows to simplify the latter a lot, by only letting it deal with the data fork. As a result xfs_itruncate_finish is renamed to xfs_itruncate_data to make its use case more obvious. Also remove the sync parameter from xfs_itruncate_data, which has been unessecary since the introduction of the busy extent list in 2002, and completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was made a no-op. I can't actually see why the xfs_attr_inactive needs to set the transaction sync, but let's keep this patch simple and without changes in behaviour. Also avoid passing a useless argument to xfs_isize_check, and make it private to xfs_inode.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: kill xfs_itruncate_start	Christoph Hellwig	1	-34/+0
	xfs_itruncate_start is a rather length wrapper that evaluates to a call to xfs_ioend_wait and xfs_tosspages, and only has two callers. Instead of using the complicated checks left over from IRIX where we can to truncate the pagecache just call xfs_tosspages (aka truncate_inode_pages) directly as we want to get rid of all data after i_size, and truncate_inode_pages handles incorrect alignments and too large offsets just fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: always log timestamp updates in xfs_setattr_size	Christoph Hellwig	1	-8/+9
	Get rid of the special case where we use unlogged timestamp updates for a truncate to the current inode size, and just call xfs_setattr_nonsize for it to treat it like a utimes calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: split xfs_setattr	Christoph Hellwig	3	-3/+441
	Split up xfs_setattr into two functions, one for the complex truncate handling, and one for the trivial attribute updates. Also move both new routines to xfs_iops.c as they are fairly Linux-specific. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: re-enable non-blocking behaviour in xfs_map_blocks	Christoph Hellwig	1	-1/+1
	The non-blockig behaviour in xfs_vm_writepage currently is conditional on having both the WB_SYNC_NONE sync_mode and the nonblocking flag set. The latter used to be used by both pdflush, kswapd and a few other places in older kernels, but has been fading out starting with the introduction of the per-bdi flusher threads. Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back the behaviour we want. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-07-08	xfs: PF_FSTRANS should never be set in ->writepage	Christoph Hellwig	1	-14/+3
	Now that we reject direct reclaim in addition to always using GFP_NOFS allocation there's no chance we'll ever end up in ->writepage with PF_FSTRANS set. Add a WARN_ON if we hit this case, and stop checking if we'd actually need to start a transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-06-16	xfs: make log devices with write back caches work	Christoph Hellwig	2	-94/+31
	There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-14	xfs: fix ->mknod() return value on xfs_get_acl() failure	Al Viro	1	-1/+1
	->mknod() should return negative on errors and PTR_ERR() gives already negative value... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-27	fs: pass exact type of data dirties to ->dirty_inode	Christoph Hellwig	1	-1/+2
	Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds	3	-2/+47
	* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec xfs: fix up asserts in xfs_iflush_fork xfs: do not do pointer arithmetic on extent records xfs: do not use unchecked extent indices in xfs_bunmapi xfs: do not use unchecked extent indices in xfs_bmapi xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* xfs: remove if_lastex xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag xfs: do not discard alloc btree blocks xfs: add online discard support
2011-05-25	vmscan: change shrinker API by passing shrink_control struct	Ying Han	2	-4/+5
	Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-24	xfs: add online discard support	Christoph Hellwig	3	-2/+47
	Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19	xfs: reset buffer pointers before freeing them	Dave Chinner	2	-0/+22
	When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19	xfs: avoid getting stuck during async inode flushes	Dave Chinner	1	-0/+10
	When the underlying inode buffer is locked and xfs_sync_inode_attr() is doing a non-blocking flush, xfs_iflush() can return EAGAIN. When this happens, clear the error rather than returning it to xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk delaying for a short while and trying again. This can result in background walks getting stuck on the one AG until inode buffer is unlocked by some other means. This behaviour was noticed when analysing event traces followed by code inspection and verification of the fix via further traces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19	xfs: fix duplicate workqueue initialisation	Dave Chinner	1	-4/+0
	The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19	xfs: kill off xfs_printk()	Joe Perches	2	-23/+4
	xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid using xfs_printk() altogether. This is the only remaining use of xfs_printk(), so changing it this way means xfs_printk() can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated. Also add format checking to the non-debug inline function xfs_debug. Miscellaneous function prototype argument alignment. (Updated to delete the definition of xfs_printk(), which is no longer used or needed.) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-09	xfs: ensure reclaim cursor is reset correctly at end of AG	Dave Chinner	1	-0/+1
	On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-By: Christian Kujau <lists@nerdbynature.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-28	xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE	Christoph Hellwig	2	-1/+3
	XFS_IOC_ZERO_RANGE uses struct xfs_flock64, and thus requires argument translation for 32-bit binaries on x86. Add the required XFS_IOC_ZERO_RANGE_32 defined and add it to the list of commands that require xfs_flock64 translation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28	xfs: fix compiler warning in xfs_trace.h	Christoph Hellwig	1	-1/+1
	xfs_fsblock_t may be a 32-bit type on if XFS_BIG_BLKNOS is not set, make sure to cast a value of this type to an unsigned long long before using the ll printk qualifier. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28	xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy	Christoph Hellwig	2	-1/+1
	Instead of finding the per-ag and then taking and releasing the pagb_lock for every single busy extent completed sort the list of busy extents and only switch betweens AGs where nessecary. This becomes especially important with the online discard support which will hit this lock more often. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28	xfs: exact busy extent tracking	Christoph Hellwig	1	-68/+11
	Update the extent tree in case we have to reuse a busy extent, so that it always is kept uptodate. This is done by replacing the busy list searches with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree in case of a reuse. This allows us to allow reusing metadata extents unconditionally, and thus avoid log forces especially for allocation btree blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28	xfs: do not immediately reuse busy extent ranges	Christoph Hellwig	1	-0/+33
	Every time we reallocate a busy extent, we cause a synchronous log force to occur to ensure the freeing transaction is on disk before we continue and use the newly allocated extent. This is extremely sub-optimal as we have to mark every transaction with blocks that get reused as synchronous. Instead of searching the busy extent list after deciding on the extent to allocate, check each candidate extent during the allocation decisions as to whether they are in the busy list. If they are in the busy list, we trim the busy range out of the extent we have found and determine if that trimmed range is still OK for allocation. In many cases, this check can be incorporated into the allocation extent alignment code which already does trimming of the found extent before determining if it is a valid candidate for allocation. Based on earlier patches from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-20	xfs: fix duplicate message output	Dave Chinner	1	-1/+3
	Commit 957935dc ("xfs: fix xfs_debug warnings" broke the logic in __xfs_printk(). Instead of only printing one of two possible output strings based on whether the fs has a name or not, it outputs both. Fix it to only output one message again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-11	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds	6	-240/+194
	* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: use proper interfaces for on-stack plugging xfs: fix xfs_debug warnings xfs: fix variable set but not used warnings xfs: convert log tail checking to a warning xfs: catch bad block numbers freeing extents. xfs: push the AIL from memory reclaim and periodic sync xfs: clean up code layout in xfs_trans_ail.c xfs: convert the xfsaild threads to a workqueue xfs: introduce background inode reclaim work xfs: convert ENOSPC inode flushing to use new syncd workqueue xfs: introduce a xfssyncd workqueue xfs: fix extent format buffer allocation size xfs: fix unreferenced var error in xfs_buf.c Also, applied patch from Tony Luck that fixes ia64: xfs_destroy_workqueues() should not be tagged with__exit in the branch before merging.
2011-04-11	xfs_destroy_workqueues() should not be tagged with__exit	Luck, Tony	1	-1/+1
	ia64 throws away .exit sections for the built-in CONFIG case, so routines that are used in other circumstances should not be tagged as __exit. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-08	xfs: use proper interfaces for on-stack plugging	Christoph Hellwig	1	-11/+9
	Add proper blk_start_plug/blk_finish_plug pairs for the two places where we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and xfs_buf_iowait, given that context switches already flush the per-process plugging lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>