aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs/blocklayout (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-09-21pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripeTrond Myklebust1-3/+6
kbuild test robot reports: fs/built-in.o: In function `bl_map_stripe': >> :(.text+0x965b4): undefined reference to `__aeabi_uldivmod' >> :(.text+0x965cc): undefined reference to `__aeabi_uldivmod' >> :(.text+0x96604): undefined reference to `__aeabi_uldivmod' Fixes: 5c83746a0cf2 (pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing) Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-19sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule()Kirill Tkhai2-2/+0
schedule(), io_schedule() and schedule_timeout() always return with TASK_RUNNING state set, so one more setting is unnecessary. (All places in patch are visible good, only exception is kiblnd_scheduler() from: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c Its schedule() is one line above standard 3 lines of unified diff) No places where set_current_state() is used for mb(). Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai Cc: Alasdair Kergon <agk@redhat.com> Cc: Anil Belur <askb23@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: David Airlie <airlied@linux.ie> Cc: David Howells <dhowells@redhat.com> Cc: Dmitry Eremin <dmitry.eremin@intel.com> Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Isaac Huang <he.huang@intel.com> Cc: James E.J. Bottomley <JBottomley@parallels.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Liang Zhen <liang.zhen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masaru Nomura <massa.nomura@gmail.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleg Drokin <green@linuxhacker.ru> Cc: Peng Tao <bergwolf@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: Robert Love <robert.w.love@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: devel@driverdev.osuosl.org Cc: dm-devel@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: fcoe-devel@open-fcoe.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux390@de.ibm.com Cc: linux-afs@lists.infradead.org Cc: linux-cris-kernel@axis.com Cc: linux-kernel@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: qla2xxx-upstream@qlogic.com Cc: user-mode-linux-devel@lists.sourceforge.net Cc: user-mode-linux-user@lists.sourceforge.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-15pnfs/blocklayout: include vmalloc.h for __vmallocStephen Rothwell1-0/+2
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pNFS/blocklayout: Remove a couple of unused variablesTrond Myklebust2-4/+1
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsingChristoph Hellwig5-148/+530
This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well as the management of complex devices. The reason for that is we might have multiple outstanding complex devices after a NOTIFY_DEVICEID4_CHANGE, which device mapper or md can't handle as they claim devices exclusively. But as is turns out simple striping / concatenation is fairly trivial to implement anyway, so we make our life simpler by reducing the reliance on blkmapd. For now we still use blkmapd by feeding it synthetic SIMPLE device XDR to translate device signatures to device numbers, but in the long runs I have plans to eliminate it entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: move all rpc_pipefs related code into a single fileChristoph Hellwig6-408/+378
Create a file to house all the rpc_pipefs boilerplate code instead of sprinkling it over a few files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: refactor extent processingChristoph Hellwig1-102/+105
Factor out a helper for all per-extent work, and merge the now trivial functions for lseg allocation and parsing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: move extent processing to blocklayout.cChristoph Hellwig3-188/+186
This isn't device(id) related, so move it into the main file. Simple move for now, the next commit will clean it up a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: allocate separate pages for the layoutcommit payloadChristoph Hellwig3-34/+91
Instead of overflowing the XDR send buffer with our extent list allocate pages and pre-encode the layoutupdate payload into them. We optimistically allocate a single page use alloc_page and only switch to vmalloc when we have more extents outstanding. Currently there is only a single testcase (xfstests generic/113) which can reproduce large enough extent lists for this to occur. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs: remove GETDEVICELIST implementationChristoph Hellwig1-1/+1
The current GETDEVICELIST implementation is buggy in that it doesn't handle cursors correctly, and in that it returns an error if the server returns NFSERR_NOTSUPP. Given that there is no actual need for GETDEVICELIST, it has various issues and might get removed for NFSv4.2 stop using it in the blocklayout driver, and thus the Linux NFS client as whole. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12pnfs/blocklayout: remove some debuggingChristoph Hellwig1-8/+0
The kbuild test robot complained that we got the printk format wrong. Let's just kill these printks instead of fixing them as there is not point after the initial tree algorithm debugging. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: use the device id cacheChristoph Hellwig5-250/+65
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: return layouts on setattrChristoph Hellwig1-1/+2
This speads up truncate-heavy workloads like fsx by multiple orders of magnitude. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: implement the return_range methodChristoph Hellwig1-0/+30
This allows removing extents from the extent tree especially on truncate operations, and thus fixing reads from truncated and re-extended that previously returned stale data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: rewrite extent trackingChristoph Hellwig6-1212/+651
Currently the block layout driver tracks extents in three separate data structures: - the two list of pnfs_block_extent structures returned by the server - the list of sectors that were in invalid state but have been written to - a list of pnfs_block_short_extent structures for LAYOUTCOMMIT All of these share the property that they are not only highly inefficient data structures, but also that operations on them are even more inefficient than nessecary. In addition there are various implementation defects like: - using an int to track sectors, causing corruption for large offsets - incorrect normalization of page or block granularity ranges - insufficient error handling - incorrect synchronization as extents can be modified while they are in use This patch replace all three data with a single unified rbtree structure tracking all extents, as well as their in-memory state, although we still need to instance for read-only and read-write extent due to the arcane client side COW feature in the block layouts spec. To fix the problem of extent possibly being modified while in use we make sure to return a copy of the extent for use in the write path - the extent can only be invalidated by a layout recall or return which has to wait until the I/O operations finished due to refcounts on the layout segment. The new extent tree work similar to the schemes used by block based filesystems like XFS or ext4. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: don't set pages uptodateChristoph Hellwig1-23/+1
The core nfs code handles setting pages uptodate on reads, no need to mess with the pageflags outselves. Also remove a debug function to dump page flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelistChristoph Hellwig1-435/+63
Use the new PNFS_READ_WHOLE_PAGE flag to offload read-modify-write handling to core nfs code, and remove a huge chunk of deadlock prone mess from the block layout writeback path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: correctly decrement extent lengthChristoph Hellwig1-3/+4
When we do non-page sized reads we can underflow the extent_length variable and read incorrect data. Fix the extent_length calculation and change to defensive <= checks for the extent length in the read and write path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: plug block queuesChristoph Hellwig1-0/+9
Make sure the block queue is plugged when performing pNFS blocklayout I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: improve GETDEVICEINFO error reportingChristoph Hellwig1-2/+3
Tell userspace what stage of GETDEVICEINFO failed so that there is a chance to debug it, especially with the userspace daemon clusterf***k in the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: reject pnfs blocksize larger than page sizeChristoph Hellwig1-0/+6
The Linux VM subsystem can't support block sizes larger than page size for block based filesystems very well. While this can be hacked around to some extent for simple filesystems the read-modify-write cycles required for pnfs block invalid extents are extremly deadlock prone when operating on multiple pages. Reject this case early on instead of pretending to support it (badly). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-25FS/NFS: replace count*size kzalloc by kcallocFabian Frederick1-1/+1
kcalloc manages count*sizeof overflow. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: remove unused writeverf codeWeston Andros Adamson1-1/+1
Remove duplicate writeverf structure from merge of nfs_pgio_header and nfs_pgio_data and remove writeverf related flags and logic to handle more than one RPC per nfs_pgio_header. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: merge nfs_pgio_data into _headerWeston Andros Adamson1-51/+47
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: rename members of nfs_pgio_dataWeston Andros Adamson1-7/+10
Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for merge of nfs_pgio_data and nfs_pgio_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29nfs: modify pg_test interface to return size_tWeston Andros Adamson1-4/+12
This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28NFS: Create a common read and write data structAnna Schumaker1-11/+11
At this point, the only difference between nfs_read_data and nfs_write_data is the write verifier. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-04-03mm: filemap: move radix tree hole searching hereJohannes Weiner1-1/+1
The radix tree hole searching code is only used for page cache, for example the readahead code trying to get a a picture of the area surrounding a fault. It sufficed to rely on the radix tree definition of holes, which is "empty tree slot". But this is about to change, though, as shadow page descriptors will be stored in the page cache after the actual pages get evicted from memory. Move the functions over to mm/filemap.c and make them native page cache operations, where they can later be adapted to handle the new definition of "page cache hole". Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe2-1/+2
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-04nfs: fix do_div() warning by instead using sector_div()Helge Deller1-1/+1
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like shown below. Fix this warning by instead using sector_div() which is provided by the kernel.h header file. fs/nfs/blocklayout/extents.c: In function ‘normalize’: include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’ nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default] fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default] include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-23block: Abstract out bvec iteratorKent Overstreet1-4/+5
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23block: Convert various code to bio_for_each_segment()Kent Overstreet1-21/+13
With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-19NFS: Enabling v4.2 should not recompile nfsd and lockdAnna Schumaker1-0/+1
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled, instead of only the nfs client. This patch moves a small amount of code into the client directory to avoid unnecessary recompiles. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcountAndy Adamson1-0/+1
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06NFSv4.1: Use layout credentials for get_deviceinfo callsTrond Myklebust1-1/+1
This is not strictly needed, since get_deviceinfo is not allowed to return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway for consistency with other pNFS operations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-07make blkdev_put() return voidAl Viro3-10/+4
same story as with the previous patches - note that return value of blkdev_close() is lost, since there's nowhere the caller (__fput()) could return it to. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-21pnfs-block: removing DM device maybe cause oops when call dev_removefanchaoting1-2/+2
when pnfs block using device mapper,if umounting later,it maybe cause oops. we apply "1 + sizeof(bl_umount_request)" memory for msg->data, the memory maybe overflow when we do "memcpy(&dataptr [sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))", because the size of bl_msg is more than 1 byte. Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-17umount oops when remove blocklayoutdriver firstfanchaoting1-0/+1
now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-12-06NFSv4.1: Move slot table and session struct definitions to nfs4session.hTrond Myklebust1-0/+1
Clean up. Gather NFSv4.1 slot definitions in fs/nfs/nfs4session.h. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08pnfsblock: cleanup nfs4_blkdev_getPeng Tao2-21/+5
It is not needed at all and it is messing with return values... Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08NFS41: send real write size in layoutgetPeng Tao1-3/+35
For buffer write, block layout client scan inode mapping to find next hole and use offset-to-hole as layoutget length. Object layout client uses offset-to-isize as layoutget length. For direct write, both block layout and object layout use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02NFSv4.1: bl_pg_init_write should be staticTrond Myklebust1-1/+1
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01pnfsblock: use list_move_tail instead of list_del/list_add_tailWei Yongjun1-2/+1
Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01pnfsblock: fix non-aligned DIO writePeng Tao1-3/+31
For DIO writes, if it is not blocksize aligned, we need to do internal serialization. It may slow down writers anyway. So we just bail them out and resend to MDS. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01pnfsblock: fix non-aligned DIO readPeng Tao1-8/+56
For DIO read, if it is not sector aligned, we should reject it and resend via MDS. Otherwise there might be data corruption. Also teach bl_read_pagelist to handle partial page reads for DIO. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01pnfsblock: fix partial page buffer wirtePeng Tao2-12/+166
If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to handle non-sector aligned writes. Otherwise there will be data corruption. Cc: stable <stable@vger.kernel.org> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01Revert "pnfsblock: bail out partial page IO"Peng Tao1-36/+3
This reverts commit 159e0561e322dd8008fff59e36efff8d2bdd0b0e, in favor of a more complete fix to the alignment issue. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Replace get_device_info() with filelayout_get_device_info()Trond Myklebust1-1/+1
Fix the namespace pollution issue. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30pnfsblock: bail out partial page IOPeng Tao1-3/+36
Current block layout driver read/write code assumes page aligned IO in many places. Add a checker to validate the assumption. Otherwise there would be data corruption like when application does open(O_WRONLY) and page unaliged write. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Use proper naming conventions for the nfs_client.net fieldChuck Lever1-1/+1
Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Introduced by commit e50a7a1a "NFS: make NFS client allocated per network namespace context," Tue Jan 10, 2012. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>