aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs/nfs4state.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-05-05NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust1-3/+7
If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2017-01-30NFSv4: Fix warning for using 0 as NULLWei Yongjun1-1/+1
Fixes the following sparse warning: fs/nfs/nfs4state.c:862:60: warning: Using plain integer as NULL pointer Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-01-26nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"Chuck Lever1-0/+1
Lock sequence IDs are bumped in decode_lock by calling nfs_increment_seqid(). nfs_increment_sequid() does not use the seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't increment lock sequence ID after NFS4ERR_MOVED"). Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-13NFSv4: Fix client recovery when server reboots multiple timesTrond Myklebust1-1/+0
If the server reboots multiple times, the client should rely on the server to tell it that it cannot reclaim state as per section 9.6.3.4 in RFC7530 and section 8.4.2.1 in RFC5661. Currently, the client is being to conservative, and is assuming that if the server reboots while state recovery is in progress, then it must ignore state that was not recovered before the reboot. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQIDNeilBrown1-16/+13
When an NFS4ERR_BAD_SEQID is received the open-owner is removed from the ->state_owners rbtree so that it will no longer be used. If any stateids attached to this open-owner are still in use, and if a request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad. The state is marked as needing recovery and the nfs4_state_manager() is scheduled to clean up. nfs4_state_manager() finds states to be recovered by walking the state_owners rbtree. As the open-owner is not in the rbtree, the bad state is not found so nfs4_state_manager() completes having done nothing. The request is then retried, with a predicatable result (indefinite retries). If the stateid is for a delegation, this open_owner will be used to open files when the delegation is returned. For that to work, a new open-owner needs to be presented to the server. This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner in the rbtree but updates the 'create_time' so it looks like a new open-owner. With this the indefinite retries no longer happen. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: ensure __nfs4_find_lock_state returns consistent result.NeilBrown1-8/+20
If a file has both flock locks and OFD locks, then it is possible that two different nfs4 lock states could apply to file accesses from a single process. It is not possible to know, efficiently, which one is "correct". Presumably the state which represents a lock that covers the region undergoing IO would be the "correct" one to use, but finding that has a non-trivial cost and would provide miniscule value. Currently we just return whichever is first in the list, which could result in inconsistent behaviour if an application ever put it self in this position. As consistent behaviour is preferable (when perfectly correct behaviour is not available), change the search to return a consistent result in this circumstance. Specifically: if there is both a flock and OFD lock state, always return the flock one. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery()Trond Myklebust1-1/+1
If the session has an error, then we want to start by recovering the session, as any SEQUENCE we send is going to fail with a session error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01NFS: discard nfs_lockowner structure.NeilBrown1-1/+1
It now has only one field and is only used in one structure. So replaced it in that structure by the field it contains. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01NFSv4: enhance nfs4_copy_lock_stateid to use a flock stateid if there is oneNeilBrown1-5/+9
A process can have two possible lock owner for a given open file: a per-process Posix lock owner and a per-open-file flock owner Use both of these when searching for a suitable stateid to use. With this patch, READ/WRITE requests will use the correct stateid if a flock lock is active. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_ownerNeilBrown1-6/+5
The only time that a lock_context is not immediately available is in setattr, and now that it has an open_context, it can easily find one with nfs_get_lock_context. This removes the need for the on-stack nfs_lockowner. This change is preparation for correctly support flock stateids. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-11-18NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_stateBenjamin Coddington1-0/+1
Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() . Instead of marking state recovery as failed, mark the state for recovery again. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27NFSv4: If recovery failed for a specific open stateid, then don't retryTrond Myklebust1-7/+11
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27NFSv4: Open state recovery must account for file permission changesTrond Myklebust1-0/+3
If the file permissions change on the server, then we may not be able to recover open state. If so, we need to ensure that we mark the file descriptor appropriately. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27NFSv4: nfs_inode_find_state_and_recover() should check all stateidsTrond Myklebust1-5/+39
Modify the helper nfs_inode_find_state_and_recover() so that it can check all open/lock/delegation state trackers on that inode for whether or not they need are affected by a revoked stateid error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27NFSv4.1: Test delegation stateids when server declares "some state revoked"Trond Myklebust1-8/+9
According to RFC5661, if any of the SEQUENCE status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED, or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need to use TEST_STATEID to figure out which stateids have been revoked, so we can acknowledge the loss of state using FREE_STATEID. While we already do this for open and lock state, we have not been doing so for all the delegations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flagsTrond Myklebust1-1/+11
As described in RFC5661, section 18.46, some of the status flags exist in order to tell the client when it needs to acknowledge the existence of revoked state on the server and/or to recover state. Those flags will then remain set until the recovery procedure is done. In order to avoid looping, the client therefore needs to ignore those particular flags while recovering. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-08-05NFSv4: Cleanup the setting of the nfs4 lease periodTrond Myklebust1-6/+3
Make a helper function nfs4_set_lease_period() and have nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-24nfs4: Fix potential use after free of state in nfs4_do_reclaim.Oleg Drokin1-1/+1
Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery") introduced access to state after it was just potentially freed by nfs4_put_open_state leading to a random data corruption somewhere. BUG: unable to handle kernel paging request at ffff88004941ee40 IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000 RIP: 0010:[<ffffffff813baf01>] [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8 RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88 RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98 R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00 FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0 Stack: ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40 Call Trace: [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff810af0d1>] kthread+0x101/0x120 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff818843af>] ret_from_fork+0x1f/0x40 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250 Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6 RIP [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP <ffff88003ff4bd68> CR2: ffff88004941ee40 Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-27nfs: fix anonymous member initializer build failure with older compilersLinus Torvalds1-1/+1
Older versions of gcc don't understand named initializers inside a anonymous structure or union member. It can be worked around by adding the bracin gin the initializer for the anonymous member. Without this, gcc 4.4.4 will fail the build with CC fs/nfs/nfs4state.o fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer fs/nfs/nfs4state.c:69: warning: missing braces around initializer fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid.<anonymous>.data’) make[2]: *** [fs/nfs/nfs4state.o] Error 1 introduced in commit 93b717fd81bf ("NFSv4: Label stateids with the type") Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Anna Schumaker <Anna.Schumaker@netapp.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17NFSv4: Use the right stateid for delegations in setattr, read and writeTrond Myklebust1-4/+9
When we're using a delegation to represent our open state, we should ensure that we use the stateid that was used to create that delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17NFSv4: Label stateids with the typeTrond Myklebust1-1/+4
In order to more easily distinguish what kind of stateid we are dealing with, introduce a type that can be used to label the stateid structure. The label will be useful both for debugging, but also when dealing with operations like SETATTR, READ and WRITE that can take several different types of stateid as arguments. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-10-02NFSv4: Don't try to reclaim unused state ownersTrond Myklebust1-1/+2
Currently, we don't test if the state owner is in use before we try to recover it. The problem is that if the refcount is zero, then the state owner will be waiting on the lru list for garbage collection. The expectation in that case is that if you bump the refcount, then you must also remove the state owner from the lru list. Otherwise the call to nfs4_put_state_owner will corrupt that list by trying to add our state owner a second time. Avoid the whole problem by just skipping state owners that hold no state. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mountOlga Kornievskaia1-1/+1
A test case is as the description says: open(foobar, O_WRONLY); sleep() --> reboot the server close(foobar) The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few line before going to restart, there is clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags). NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes out state and when we go to close it, “call_close” doesn’t get set as state flag is not set and CLOSE doesn’t go on the wire. Signed-off-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17NFS: Remove nfs41_server_notify_{target|highest}_slotid_update()Anna Schumaker1-11/+1
All these functions do is call nfs41_ping_server() without adding anything. Let's remove them and give nfs41_ping_server() a better name instead. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-05NFSv4.1: Handle SEQ4_STATUS_BACKCHANNEL_FAULT correctlyTrond Myklebust1-3/+3
RFC5661 states: The server has encountered an unrecoverable fault with the backchannel (e.g., it has lost track of the sequence ID for a slot in the backchannel). The client MUST stop sending more requests on the session's fore channel, wait for all outstanding requests to complete on the fore and back channel, and then destroy the session. Ensure we do so... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-05NFSv4.1: Handle SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit correctlyTrond Myklebust1-2/+4
Try to handle this for now by invalidating all outstanding layouts for this server and then testing all the open+lock+delegation stateids. At some later stage, we may want to optimise by separating out the testing of delegation stateids only, and adding testing of layout stateids. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-05NFSv4.1: Handle SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED status bit correctly.Trond Myklebust1-4/+13
If the server tells us that only some state has been revoked, then we need to run the full TEST_STATEID dog and pony show in order to discover which locks and delegations are still OK. Currently we blow away all state, which means that we lose all locks! Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02NFSv4: Always drain the slot table before re-establishing the leaseTrond Myklebust1-1/+1
While the NFSv4.1 code has always drained the slot tables in order to stop non-recovery related RPC calls when doing lease recovery, the NFSv4 code did not. The reason for the difference in behaviour is that NFSv4 does not have session state, and so RPC calls can in theory proceed while recovery is happening. In practice, however, anything I/O or state related needs to wait until recovery is over. This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that we can simplify the state recovery code by assuming that we do not have to deal with races between recovery and ordinary I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-01fixing infinite OPEN loop in 4.0 stateid recoveryOlga Kornievskaia1-0/+2
Problem: When an operation like WRITE receives a BAD_STATEID, even though recovery code clears the RECLAIM_NOGRACE recovery flag before recovering the open state, because of clearing delegation state for the associated inode, nfs_inode_find_state_and_recover() gets called and it makes the same state with RECLAIM_NOGRACE flag again. As a results, when we restart looking over the open states, we end up in the infinite loop instead of breaking out in the next test of state flags. Solution: unset the RECLAIM_NOGRACE set because of calling of nfs_inode_find_state_and_recover() after returning from calling recover_open() function. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-26Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+1
Pull NFS client updates from Trond Myklebust: "Another set of mainly bugfixes and a couple of cleanups. No new functionality in this round. Highlights include: Stable patches: - Fix a regression in /proc/self/mountstats - Fix the pNFS flexfiles O_DIRECT support - Fix high load average due to callback thread sleeping Bugfixes: - Various patches to fix the pNFS layoutcommit support - Do not cache pNFS deviceids unless server notifications are enabled - Fix a SUNRPC transport reconnection regression - make debugfs file creation failure non-fatal in SUNRPC - Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints - Fix locking around NFSv4.2 fallocate() support - Truncating NFSv4 file opens should also sync O_DIRECT writes - Prevent infinite loop in rpcrdma_ep_create() Features: - Various improvements to the RDMA transport code's handling of memory registration - Various code cleanups" * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits) fs/nfs: fix new compiler warning about boolean in switch nfs: Remove unneeded casts in nfs NFS: Don't attempt to decode missing directory entries Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" NFS: Rename idmap.c to nfs4idmap.c NFS: Move nfs_idmap.h into fs/nfs/ NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h NFS: Add a stub for GETDEVICELIST nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes nfs: fix DIO good bytes calculation nfs: Fetch MOUNTED_ON_FILEID when updating an inode sunrpc: make debugfs file creation failure non-fatal nfs: fix high load average due to callback thread sleeping NFS: Reduce time spent holding the i_mutex during fallocate() NFS: Don't zap caches on fallocate() xprtrdma: Make rpcrdma_{un}map_one() into inline functions xprtrdma: Handle non-SEND completions via a callout xprtrdma: Add "open" memreg op xprtrdma: Add "destroy MRs" memreg op xprtrdma: Add "reset MRs" memreg op ...
2015-04-23NFS: Move nfs_idmap.h into fs/nfs/Anna Schumaker1-1/+1
This file is only used internally to the NFS v4 module, so it doesn't need to be in the global include path. I also renamed it from nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells1-2/+2
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-03NFSv4.1: Clear the old state by our client id before establishing a new leaseTrond Myklebust1-1/+5
If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag set, then that means our lease was established by a previous mount instance. Ensure that we detect this situation, and that we clear the state held by that mount. Reported-by: Jorge Mora <Jorge.Mora@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-03NFSv4: Fix a race in NFSv4.1 server trunking discoveryTrond Myklebust1-2/+12
We do not want to allow a race with another NFS mount to cause nfs41_walk_client_list() to establish a lease on our nfs_client before we're done checking for trunking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-11Merge tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-12/+19
Pull NFS client updates from Trond Myklebust: "Highlights incluse: Features: - Removing the forced serialisation of open()/close() calls in NFSv4.x (x>0) makes for a significant performance improvement in metadata intensive workloads. - Full support for the pNFS "flexible files" layout type - Further RPC/RDMA client improvements from Chuck Bugfixes: - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, - Stable fix: Ensure we reference the inode for return-on-close in delegreturn - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to the same source address/port combination during a disconnect/ reconnect event. This is a requirement imposed by most NFSv3 server duplicate reply cache implementations. Optimisations: - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT Other: - Add Anna Schumaker as co-maintainer for the NFS client" * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits) SUNRPC: Cleanup to remove xs_tcp_close() pnfs: delete an unintended goto pnfs/flexfiles: Do not dprintk after the free SUNRPC: Fix stupid typo in xs_sock_set_reuseport SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG SUNRPC: Handle connection reset more efficiently. SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT SUNRPC: Remove TCP socket linger code SUNRPC: Remove TCP client connection reset hack SUNRPC: TCP/UDP always close the old socket before reconnecting SUNRPC: Add helpers to prevent socket create from racing SUNRPC: Ensure xs_reset_transport() resets the close connection flags SUNRPC: Do not clear the source port in xs_reset_transport SUNRPC: Handle EADDRINUSE on connect SUNRPC: Set SO_REUSEPORT socket option for TCP connections NFSv4.1: Fix pnfs_put_lseg races NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS ...
2015-01-23NFSv4: Check for NULL argument in nfs_*_seqid() functionsTrond Myklebust1-7/+14
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23NFSv4: Convert nfs_alloc_seqid() to return an ERR_PTR() if allocation failsTrond Myklebust1-5/+5
When we relax the sequencing on the NFSv4.1 OPEN/CLOSE code, we will want to use the value NULL to indicate that no sequencing is needed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-16locks: add a dedicated spinlock to protect i_flctx listsJeff Layton1-4/+4
We can now add a dedicated spinlock without expanding struct inode. Change to using that to protect the various i_flctx lists. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: convert posix locks to file_lock_contextJeff Layton1-42/+10
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: move flock locks to file_lock_contextJeff Layton1-2/+40
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2014-09-30Merge branch 'bugfixes' into linux-nextTrond Myklebust1-11/+6
* bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries
2014-09-28NFSv4: fix open/lock state recovery error handlingTrond Myklebust1-10/+6
The current open/lock state recovery unfortunately does not handle errors such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping, just proceeds as if the state manager is finished recovering. This patch ensures that we loop back, handle higher priority errors and complete the open/lock state recovery. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-28NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM failsTrond Myklebust1-1/+0
If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted a second time, then the client will currently take this to mean that it must declare all locks to be stale, and hence ineligible for reboot recovery. RFC3530 and RFC5661 both suggest that the client should instead rely on the server to respond to inelegible open share, lock and delegation reclaim requests with NFS4ERR_NO_GRACE in this situation. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-24Fixing lease renewalOlga Kornievskaia1-0/+1
Commit c9fdeb28 removed a 'continue' after checking if the lease needs to be renewed. However, if client hasn't moved, the code falls down to starting reboot recovery erroneously (ie., sends open reclaim and gets back stale_clientid error) before recovering from getting stale_clientid on the renew operation. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: c9fdeb280b8c (NFS: Add basic migration support to state manager thread) Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-08nfs: revert "nfs4: queue free_lock_state job submission to nfsiod"Jeff Layton1-18/+6
This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d. Christoph reported an oops due to the above commit: generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP [ 2187.042899] Modules linked in: [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 2187.044287] Workqueue: nfsiod free_lock_state_work [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 [ 2187.044287] Stack: [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 [ 2187.044287] Call Trace: [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP <ffff88007a4efd58> [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- The original reason for this patch was because the fl_release_private operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped), this is no longer a problem so we can revert this patch. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-13Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-42/+27
Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for a bug in nfs3_list_one_acl() - speed up NFS path walks by supporting LOOKUP_RCU - more read/write code cleanups - pNFS fixes for layout return on close - fixes for the RCU handling in the rpcsec_gss code - more NFS/RDMA fixes" * tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) nfs: reject changes to resvport and sharecache during remount NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred NFS: fix two problems in lookup_revalidate in RCU-walk NFS: allow lockless access to access_cache NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU NFS: support RCU_WALK in nfs_permission() sunrpc/auth: allow lockless (rcu) lookup of credential cache. NFS: prepare for RCU-walk support but pushing tests later in code. NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. NFS: add checks for returned value of try_module_get() nfs: clear_request_commit while holding i_lock pnfs: add pnfs_put_lseg_async pnfs: find swapped pages on pnfs commit lists too nfs: fix comment and add warn_on for PG_INODE_REF nfs: check wait_on_bit_lock err in page_group_lock sunrpc: remove "ec" argument from encrypt_v2 operation sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c ...
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown1-2/+2
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-12nfs4: queue free_lock_state job submission to nfsiodJeff Layton1-6/+18
We got a report of the following warning in Fedora: BUG: sleeping function called from invalid context at mm/slub.c:969 in_atomic(): 1, irqs_disabled(): 0, pid: 533, name: bash 3 locks held by bash/533: #0: (&sp->so_delegreturn_mutex){+.+...}, at: [<ffffffffa033da62>] nfs4_proc_lock+0x262/0x910 [nfsv4] #1: (&nfsi->rwsem){.+.+.+}, at: [<ffffffffa033da6a>] nfs4_proc_lock+0x26a/0x910 [nfsv4] #2: (&sb->s_type->i_lock_key#23){+.+...}, at: [<ffffffff812998dc>] flock_lock_file_wait+0x8c/0x3a0 CPU: 0 PID: 533 Comm: bash Not tainted 3.15.0-0.rc1.git1.1.fc21.x86_64 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 00000000d664ff3c ffff880078b69a70 ffffffff817e82e0 0000000000000000 ffff880078b69a98 ffffffff810cf1a4 0000000000000050 0000000000000050 ffff88007cc01a00 ffff880078b69ad8 ffffffff8121449e Call Trace: [<ffffffff817e82e0>] dump_stack+0x4d/0x66 [<ffffffff810cf1a4>] __might_sleep+0x184/0x240 [<ffffffff8121449e>] kmem_cache_alloc_trace+0x4e/0x330 [<ffffffffa0331124>] ? nfs4_release_lockowner+0x74/0x110 [nfsv4] [<ffffffffa0331124>] nfs4_release_lockowner+0x74/0x110 [nfsv4] [<ffffffffa0352340>] nfs4_put_lock_state+0x90/0xb0 [nfsv4] [<ffffffffa0352375>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffff81297515>] locks_free_lock+0x45/0x90 [<ffffffff8129996c>] flock_lock_file_wait+0x11c/0x3a0 [<ffffffffa033da6a>] ? nfs4_proc_lock+0x26a/0x910 [nfsv4] [<ffffffffa033301e>] do_vfs_lock+0x1e/0x30 [nfsv4] [<ffffffffa033da79>] nfs4_proc_lock+0x279/0x910 [nfsv4] [<ffffffff810dbb26>] ? local_clock+0x16/0x30 [<ffffffff810f5a3f>] ? lock_release_holdtime.part.28+0xf/0x200 [<ffffffffa02f820c>] do_unlk+0x8c/0xc0 [nfs] [<ffffffffa02f85c5>] nfs_flock+0xa5/0xf0 [nfs] [<ffffffff8129a6f6>] locks_remove_file+0xb6/0x1e0 [<ffffffff812159d8>] ? kfree+0xd8/0x2d0 [<ffffffff8123bc63>] __fput+0xd3/0x210 [<ffffffff8123bdee>] ____fput+0xe/0x10 [<ffffffff810bfb6d>] task_work_run+0xcd/0xf0 [<ffffffff81019cd1>] do_notify_resume+0x61/0x90 [<ffffffff817fbea2>] int_signal+0x12/0x17 The problem is that NFSv4 is trying to do an allocation from fl_release_private (in order to send a RELEASE_LOCKOWNER call). That function can be called while holding the inode->i_lock, and it's currently set up to do __GFP_WAIT allocations. v4.1 code has a similar problem. This patch adds a work_struct to the nfs4_lock_state and has the code queue the free_lock_state operation to nfsiod. Reported-by: Josh Stone <jistone@redhat.com> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-12nfs4: treat lock owners as opaque valuesJeff Layton1-36/+9
Do the following set of ops with a file on a NFSv4 mount: exec 3>>/file/on/nfsv4 flock -x 3 exec 3>&- You'll see the LOCK request go across the wire, but no LOCKU when the file is closed. What happens is that the fd is passed across a fork, and the final close is done in a different process than the opener. That makes __nfs4_find_lock_state miss finding the correct lock state because it uses the fl_pid as a search key. A new one is created, and the locking code treats it as a delegation stateid (because NFS_LOCK_INITIALIZED isn't set). The root cause of this breakage seems to be commit 77041ed9b49a9e (NFSv4: Ensure the lockowners are labelled using the fl_owner and/or fl_pid). That changed it so that flock lockowners are allocated based on the fl_pid. I think this is incorrect. flock locks should be "owned" by the struct file, and that is already accounted for in the fl_owner field of the lock request when it comes through nfs_flock. This patch basically reverts the above commit and with it, a LOCKU is sent in the above reproducer. Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-10Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-3/+3
Pull NFS client updates from Trond Myklebust: "Highlights include: - massive cleanup of the NFS read/write code by Anna and Dros - support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory" * tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) NFS: populate ->net in mount data when remounting pnfs: fix lockup caused by pnfs_generic_pg_test NFSv4.1: Fix typo in dprintk NFSv4.1: Comment is now wrong and redundant to code NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state xprtrdma: Disconnect on registration failure xprtrdma: Remove BUG_ON() call sites xprtrdma: Avoid deadlock when credit window is reset SUNRPC: Move congestion window constants to header file xprtrdma: Reset connection timeout after successful reconnect xprtrdma: Use macros for reconnection timeout constants xprtrdma: Allocate missing pagelist xprtrdma: Remove Tavor MTU setting xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting xprtrdma: Reduce the number of hardway buffer allocations xprtrdma: Limit work done by completion handler xprtrmda: Reduce calls to ib_poll_cq() in completion handlers xprtrmda: Reduce lock contention in completion handlers xprtrdma: Split the completion queue xprtrdma: Make rpcrdma_ep_destroy() return void ...