aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-08-15Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds23-121/+2274
Pull NFS client updates from Trond Myklebust: "Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem" * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits) NFS: Fix flexfiles read failover fs: nfs: delete repeated words in comments rpc_pipefs: convert comma to semicolon nfs: Fix getxattr kernel panic and memory overflow NFS: Don't return layout segments that are in use NFS: Don't move layouts to plh_return_segs list while in use NFS: Add layout segment info to pnfs read/write/commit tracepoints NFS: Add tracepoints for layouterror and layoutstats. NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close() SUNRPC dont update timeout value on connection reset nfs: nfs_file_write() should check for writeback errors nfs: ensure correct writeback errors are returned on close() NFSv4.2: xattr cache: get rid of cache discard work queue NFS: remove redundant initialization of variable result NFSv4.0 allow nconnect for v4.0 freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS sunrpc: destroy rpc_inode_cachep after unregister_filesystem NFSv4.2: add client side xattr caching. NFSv4.2: hook in the user extended attribute handlers NFSv4.2: add the extended attribute proc functions. ...
2020-08-12NFS: Fix flexfiles read failoverTrond Myklebust3-16/+40
The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12fs: nfs: delete repeated words in commentsRandy Dunlap2-2/+2
Drop duplicated words {the, and} in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12rpc_pipefs: convert comma to semicolonXu Wang1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12nfs: Fix getxattr kernel panic and memory overflowJeffrey Mitchell2-3/+5
Move the buffer size check to decode_attr_security_label() before memcpy() Only call memcpy() if the buffer is large enough Fixes: aa9c2669626c ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io> [Trond: clean up duplicate test of label->len != 0] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12NFS: Don't return layout segments that are in useTrond Myklebust1-19/+15
If the NFS_LAYOUT_RETURN_REQUESTED flag is set, we want to return the layout as soon as possible, meaning that the affected layout segments should be marked as invalid, and should no longer be in use for I/O. Fixes: f0b429819b5f ("pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12NFS: Don't move layouts to plh_return_segs list while in useTrond Myklebust1-11/+1
If the layout segment is still in use for a read or a write, we should not move it to the layout plh_return_segs list. If we do, we can end up returning the layout while I/O is still in progress. Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12NFS: Add layout segment info to pnfs read/write/commit tracepointsTrond Myklebust1-6/+36
Allow the pnfs I/O tracepoints to trace which layout segment is being used. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-10Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+2
Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-05NFS: Add tracepoints for layouterror and layoutstats.Trond Myklebust2-2/+10
Allow tracing of the NFSv4.2 layouterror and layoutstats operations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-05NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()Trond Myklebust2-2/+2
Ensure we correctly report the stateid and status in the layoutreturn on close tracepoint. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-04nfs: nfs_file_write() should check for writeback errorsScott Mayhew1-3/+9
The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was removed by commit 6fbda89b257f. The absence of an error check allows writes to be continually queued up for a server that may no longer be able to handle them. Fix it by adding an error check using the generic error reporting functions. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-03Merge branches 'pm-sleep', 'pm-domains', 'powercap' and 'pm-tools'Rafael J. Wysocki1-1/+1
* pm-sleep: PM: sleep: spread "const char *" correctness PM: hibernate: fix white space in a few places freezer: Add unsafe version of freezable_schedule_timeout_interruptible() for NFS PM: sleep: core: Emit changed uevent on wakeup_sysfs_add/remove * pm-domains: PM: domains: Restore comment indentation for generic_pm_domain.child_links PM: domains: Fix up terminology with parent/child * powercap: powercap: Add Power Limit4 support powercap: idle_inject: Replace play_idle() with play_idle_precise() in comments powercap: intel_rapl: add support for Sapphire Rapids * pm-tools: pm-graph v5.7 - important s2idle fixes cpupower: Replace HTTP links with HTTPS ones cpupower: Fix NULL but dereferenced coccicheck errors cpupower: Fix comparing pointer to 0 coccicheck warns
2020-08-01nfs: ensure correct writeback errors are returned on close()Scott Mayhew2-2/+8
nfs_wb_all() calls filemap_write_and_wait(), which uses filemap_check_errors() to determine the error to return. filemap_check_errors() only looks at the mapping->flags and will therefore only return either -ENOSPC or -EIO. To ensure that the correct error is returned on close(), nfs{,4}_file_flush() should call filemap_check_wb_err() which looks at the errseq value in mapping->wb_err without consuming it. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-30NFSv4.2: xattr cache: get rid of cache discard work queueFrank van der Linden1-32/+5
Caches should be small enough to discard them inline, so do that instead of using a work queue. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-29NFSv4: Use sequence counter with associated spinlockAhmed S. Darwish2-2/+2
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-22-a.darwish@linutronix.de
2020-07-28NFS: remove redundant initialization of variable resultColin Ian King1-1/+1
The variable result is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-17SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO compeletion")Olga Kornievskaia2-10/+4
Reverting commit d03727b248d0 "NFSv4 fix CLOSE not waiting for direct IO compeletion". This patch made it so that fput() by calling inode_dio_done() in nfs_file_release() would wait uninterruptably for any outstanding directIO to the file (but that wait on IO should be killable). The problem the patch was also trying to address was REMOVE returning ERR_ACCESS because the file is still opened, is supposed to be resolved by server returning ERR_FILE_OPEN and not ERR_ACCESS. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-17NFSv4.0 allow nconnect for v4.0Olga Kornievskaia1-1/+1
It looks like this "else" is just a typo. It turns off nconnect for NFSv4.0 even though it works for every other version. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-17freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFSHe Zhe1-1/+1
commit 0688e64bc600 ("NFS: Allow signal interruption of NFS4ERR_DELAYed operations") introduces nfs4_delay_interruptible which also needs an _unsafe version to avoid the following call trace for the same reason explained in commit 416ad3c9c006 ("freezer: add unsafe versions of freezable helpers for NFS") CPU: 4 PID: 3968 Comm: rm Tainted: G W 5.8.0-rc4 #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x1dc show_stack+0x20/0x30 dump_stack+0xdc/0x150 debug_check_no_locks_held+0x98/0xa0 nfs4_delay_interruptible+0xd8/0x120 nfs4_handle_exception+0x130/0x170 nfs4_proc_rmdir+0x8c/0x220 nfs_rmdir+0xa4/0x360 vfs_rmdir.part.0+0x6c/0x1b0 do_rmdir+0x18c/0x210 __arm64_sys_unlinkat+0x64/0x7c el0_svc_common.constprop.0+0x7c/0x110 do_el0_svc+0x24/0xa0 el0_sync_handler+0x13c/0x1b8 el0_sync+0x158/0x180 Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-17Merge branch 'xattr-devel'Trond Myklebust14-41/+2163
2020-07-14freezer: Add unsafe version of freezable_schedule_timeout_interruptible() for NFSHe Zhe1-1/+1
commit 0688e64bc600 ("NFS: Allow signal interruption of NFS4ERR_DELAYed operations") introduces nfs4_delay_interruptible which also needs an _unsafe version to avoid the following call trace for the same reason explained in commit 416ad3c9c006 ("freezer: add unsafe versions of freezable helpers for NFS") CPU: 4 PID: 3968 Comm: rm Tainted: G W 5.8.0-rc4 #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x1dc show_stack+0x20/0x30 dump_stack+0xdc/0x150 debug_check_no_locks_held+0x98/0xa0 nfs4_delay_interruptible+0xd8/0x120 nfs4_handle_exception+0x130/0x170 nfs4_proc_rmdir+0x8c/0x220 nfs_rmdir+0xa4/0x360 vfs_rmdir.part.0+0x6c/0x1b0 do_rmdir+0x18c/0x210 __arm64_sys_unlinkat+0x64/0x7c el0_svc_common.constprop.0+0x7c/0x110 do_el0_svc+0x24/0xa0 el0_sync_handler+0x13c/0x1b8 el0_sync+0x158/0x180 Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-13NFSv4.2: add client side xattr caching.Frank van der Linden7-8/+1172
Implement client side caching for NFSv4.2 extended attributes. The cache is a per-inode hashtable, with name/value entries. There is one special entry for the listxattr cache. NFS inodes have a pointer to a cache structure. The cache structure is allocated on demand, freed when the cache is invalidated. Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE) are collected by a separate shrinker, and freed more aggressively than others. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: hook in the user extended attribute handlersFrank van der Linden1-2/+121
Now that all the lower level code is there to make the RPC calls, hook it in to the xattr handlers and the listxattr entry point, to make them available. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: add the extended attribute proc functions.Frank van der Linden2-0/+244
Implement the extended attribute procedures for NFSv4.2 extended attribute support (RFC 8276). Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13nfs: make the buf_to_pages_noslab function available to the nfs codeFrank van der Linden2-2/+4
Make the buf_to_pages_noslab function available to the rest of the NFS code. Rename it to nfs4_buf_to_pages_noslab to be consistent. This will be used later in the NFSv4.2 xattr code. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13nfs: define and use the NFS_INO_INVALID_XATTR flagFrank van der Linden3-3/+10
Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr cache, and use it where appropriate. No functional change as yet. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13nfs: modify update_changeattr to deal with regular filesFrank van der Linden2-26/+49
Until now, change attributes in change_info form were only returned by directory operations. However, they are also used for the RFC 8276 extended attribute operations, which work on both directories and regular files. Modify update_changeattr to deal: * Rename it to nfs4_update_changeattr and make it non-static. * Don't always use INO_INVALID_DATA, this isn't needed for a directory that only had its extended attributes changed by us. * Existing callers now always pass in INO_INVALID_DATA. For the current callers of this function, behavior is unchanged. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: query the extended attribute access bitsFrank van der Linden2-0/+10
RFC 8276 defines separate ACCESS bits for extended attribute checking. Query them in nfs_do_access and opendata. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13nfs: define nfs_access_get_cached functionFrank van der Linden1-4/+16
The only consumer of nfs_access_get_cached_rcu and nfs_access_cached calls these static functions in order to first try RCU access, and then locked access. Combine them in to a single function, and call that. Make this function available to the rest of the NFS code. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: add client side XDR handling for extended attributesFrank van der Linden2-2/+372
Define the argument and response structures that will be used for RFC 8276 extended attribute RPC calls, and implement the necessary functions to encode/decode the extended attribute operations. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: query the server for extended attribute supportFrank van der Linden3-1/+30
Query the server for extended attribute support, and record it as the NFS_CAP_XATTR flag in the server capabilities. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFSv4.2: define limits and sizes for user xattr handlingFrank van der Linden5-2/+144
Set limits for extended attributes (attribute value size and listxattr buffer size), based on the fs-independent limits (XATTR_*_MAX). Define the maximum XDR sizes for the RFC 8276 XATTR operations. In the case of operations that carry a larger payload (SETXATTR, GETXATTR, LISTXATTR), these exclude that payload, which is added as separate pages, like other operations do. Define, much like for read and write operations, the maximum overhead sizes for get/set/listxattr, and use them to limit the maximum payload size for those operations, in combination with the channel attributes. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13NFS: Fix interrupted slots by sending a solo SEQUENCE operationAnna Schumaker1-2/+18
We used to do this before 3453d5708b33, but this was changed to better handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the slot re-use case when the server doesn't receive the interrupted operation, but if the server does receive the operation then it could still end up replying to the client with mis-matched operations from the reply cache. We can fix this by sending a SEQUENCE to the server while recovering from a SEQ_MISORDERED error when we detect that we are in an interrupted slot situation. Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are interrupted) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-12pNFS/flexfiles: The mirror count could depend on the layout segment rangeTrond Myklebust1-2/+2
Make sure we specify the layout segment range when calculating the mirror count. In theory, that number could depend on the range to which we're writing. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12pNFS/flexfiles: Clean up redundant calls to pnfs_put_lseg()Trond Myklebust1-8/+2
Both nfs_pageio_reset_read_mds() and nfs_pageio_reset_write_mds() do call pnfs_generic_pg_cleanup() for us. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNCTrond Myklebust1-1/+3
If the application uses the AT_STATX_DONT_SYNC flag after doing readdir(), then we should still mark the parent inode as seeing a readdirplus hit. That ensures that we continue to use readdirplus in the 'ls -l' type of workflow to do fast lookups of the dentries. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-06-26NFSv4 fix CLOSE not waiting for direct IO compeletionOlga Kornievskaia2-4/+10
Figuring out the root case for the REMOVE/CLOSE race and suggesting the solution was done by Neil Brown. Currently what happens is that direct IO calls hold a reference on the open context which is decremented as an asynchronous task in the nfs_direct_complete(). Before reference is decremented, control is returned to the application which is free to close the file. When close is being processed, it decrements its reference on the open_context but since directIO still holds one, it doesn't sent a close on the wire. It returns control to the application which is free to do other operations. For instance, it can delete a file. Direct IO is finally releasing its reference and triggering an asynchronous close. Which races with the REMOVE. On the server, REMOVE can be processed before the CLOSE, failing the REMOVE with EACCES as the file is still opened. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Suggested-by: Neil Brown <neilb@suse.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26pNFS/flexfiles: Fix list corruption if the mirror count changesTrond Myklebust1-4/+7
If the mirror count changes in the new layout we pick up inside ff_layout_pg_init_write(), then we can end up adding the request to the wrong mirror and corrupting the mirror->pg_list. Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26nfs: Fix memory leak of export_pathTom Rix1-0/+1
The try_location function is called within a loop by nfs_follow_referral. try_location calls nfs4_pathname_string to created the export_path. nfs4_pathname_string allocates the memory. export_path is stored in the nfs_fs_context/fs_context structure similarly as hostname and source. But whereas the ctx hostname and source are freed before assignment, export_path is not. So if there are multiple loops, the new export_path will overwrite the old without the old being freed. So call kfree for export_path. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds9-10/+125
Pull NFS client updates from Anna Schumaker: "New features and improvements: - Sunrpc receive buffer sizes only change when establishing a GSS credentials - Add more sunrpc tracepoints - Improve on tracepoints to capture internal NFS I/O errors Other bugfixes and cleanups: - Move a dprintk() to after a call to nfs_alloc_fattr() - Fix off-by-one issues in rpc_ntop6 - Fix a few coccicheck warnings - Use the correct SPDX license identifiers - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION - Replace zero-length array with flexible array - Remove duplicate headers - Set invalid blocks after NFSv4 writes to update space_used attribute - Fix direct WRITE throughput regression" * tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Fix direct WRITE throughput regression SUNRPC: rpc_xprt lifetime events should record xprt->state xprtrdma: Make xprt_rdma_slot_table_entries static nfs: set invalid blocks after NFSv4 writes NFS: remove redundant initialization of variable result sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs NFS: Add a tracepoint in nfs_set_pgio_error() NFS: Trace short NFS READs NFS: nfs_xdr_status should record the procedure name SUNRPC: Set SOFTCONN when destroying GSS contexts SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS SUNRPC: trace RPC client lifetime events SUNRPC: Trace transport lifetime events SUNRPC: Split the xdr_buf event class SUNRPC: Add tracepoint to rpc_call_rpcerror() SUNRPC: Update the RPC_SHOW_SOCKET() macro SUNRPC: Update the rpc_show_task_flags() macro SUNRPC: Trace GSS context lifetimes SUNRPC: receive buffer size estimation values almost never change ...
2020-06-11NFS: Fix direct WRITE throughput regressionChuck Lever1-0/+2
I measured a 50% throughput regression for large direct writes. The observed on-the-wire behavior is that the client sends every NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once as a FILE_SYNC WRITE. This is because the nfs_write_match_verf() check in nfs_direct_commit_complete() fails for every WRITE. Buffered writes use nfs_write_completion(), which sets req->wb_verf correctly. Direct writes use nfs_direct_write_completion(), which does not set req->wb_verf at all. This leaves req->wb_verf set to all zeroes for every direct WRITE, and thus nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES. This fix appears to restore nearly all of the lost performance. Fixes: 1f28476dcb98 ("NFS: Fix O_DIRECT commit verifier handling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11nfs: set invalid blocks after NFSv4 writesZheng Bin1-3/+11
Use the following command to test nfsv4(size of file1M is 1MB): mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt cp file1M /mnt du -h /mnt/file1M -->0 within 60s, then 1M When write is done(cp file1M /mnt), will call this: nfs_writeback_done nfs4_write_done nfs4_write_done_cb nfs_writeback_update_inode nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime nfs_post_op_update_inode_force_wcc_locked nfs_set_cache_invalid nfs_refresh_inode_locked nfs_update_inode nfsd write response contains change, ctime, mtime, the flag will be clear after nfs_update_inode. Howerver, write response does not contain space_used, previous open response contains space_used whose value is 0, so inode->i_blocks is still 0. nfs_getattr -->called by "du -h" do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s cache_validity = READ_ONCE(NFS_I(inode)->cache_validity) do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false if (do_update) { __nfs_revalidate_inode } Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M" is 0. Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done. Fixes: 16e143751727 ("NFS: More fine grained attribute tracking") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: remove redundant initialization of variable resultColin Ian King1-1/+1
The variable result is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: Add a tracepoint in nfs_set_pgio_error()Chuck Lever2-0/+46
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: Trace short NFS READsChuck Lever2-0/+49
A short read can generate an -EIO error without there being an error on the wire. This tracepoint acts as an eyecatcher when there is no obvious I/O error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: nfs_xdr_status should record the procedure nameChuck Lever1-2/+13
When sunrpc trace points are not enabled, the recorded task ID information alone is not helpful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig1-1/+1
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK insteadNeilBrown2-5/+9
After an NFS page has been written it is considered "unstable" until a COMMIT request succeeds. If the COMMIT fails, the page will be re-written. These "unstable" pages are currently accounted as "reclaimable", either in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a 'reclaimable' count. This might have made sense when sending the COMMIT required a separate action by the VFS/MM (e.g. releasepage() used to send a COMMIT). However now that all writes generated by ->writepages() will automatically be followed by a COMMIT (since commit 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")) it makes more sense to treat them as writeback pages. So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in NR_WRITEBACK and WB_WRITEBACK. A particular effect of this change is that when wb_check_background_flush() calls wb_over_bg_threshold(), the latter will report 'true' a lot less often as the 'unstable' pages are no longer considered 'dirty' (as there is nothing that writeback can do about them anyway). Currently wb_check_background_flush() will trigger writeback to NFS even when there are relatively few dirty pages (if there are lots of unstable pages), this can result in small writes going to the server (10s of Kilobytes rather than a Megabyte) which hurts throughput. With this patch, there are fewer writes which are each larger on average. Where the NR_UNSTABLE_NFS count was included in statistics virtual-files, the entry is retained, but the value is hard-coded as zero. static trace points and warning printks which mentioned this counter no longer report it. [akpm@linux-foundation.org: re-layout comment] [akpm@linux-foundation.org: fix printk warning] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com> Acked-by: Michal Hocko <mhocko@suse.com> [mm] Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-27NFS: remove duplicate headersChen Zhou1-1/+0
Remove duplicate headers which are included twice. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>