aboutsummaryrefslogtreecommitdiffstats
path: root/net/sunrpc/xprtrdma/xprt_rdma.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-18SUNRPC: Fix up backchannel slot table accountingTrond Myklebust1-0/+1
Add a per-transport maximum limit in the socket case, and add helpers to allow the NFSv4 code to discover that limit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-09xprtrdma: Modernize ops->connectChuck Lever1-0/+1
Adapt and apply changes that were made to the TCP socket connect code. See the following commits for details on the purpose of these changes: Commit 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly") Commit 3851f1cdb2b8 ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout") Commit 02910177aede ("SUNRPC: Fix reconnection timeouts") Some common transport code is moved to xprt.c to satisfy the code duplication police. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Remove rpcrdma_req::rl_bufferChuck Lever1-2/+2
Clean up. There is only one remaining function, rpcrdma_buffer_put(), that uses this field. Its caller can supply a pointer to the correct rpcrdma_buffer, enabling the removal of an 8-byte pointer field from a frequently-allocated shared data structure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Wake RPCs directly in rpcrdma_wc_send pathChuck Lever1-9/+3
Eliminate a context switch in the path that handles RPC wake-ups when a Receive completion has to wait for a Send completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Reduce context switching due to Local InvalidationChuck Lever1-4/+4
Since commit ba69cd122ece ("xprtrdma: Remove support for FMR memory registration"), FRWR is the only supported memory registration mode. We can take advantage of the asynchronous nature of FRWR's LOCAL_INV Work Requests to get rid of the completion wait by having the LOCAL_INV completion handler take care of DMA unmapping MRs and waking the upper layer RPC waiter. This eliminates two context switches when local invalidation is necessary. As a side benefit, we will no longer need the per-xprt deferred completion work queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Add mechanism to place MRs back on the free listChuck Lever1-0/+1
When a marshal operation fails, any MRs that were already set up for that request are recycled. Recycling releases MRs and creates new ones, which is expensive. Since commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was merged, recycling FRWRs is unnecessary. This is because before that commit, frwr_map had already posted FAST_REG Work Requests, so ownership of the MRs had already been passed to the NIC and thus dealing with them had to be delayed until they completed. Since that commit, however, FAST_REG WRs are posted at the same time as the Send WR. This means that if marshaling fails, we are certain the MRs are safe to simply unmap and place back on the free list because neither the Send nor the FAST_REG WRs have been posted yet. The kernel still has ownership of the MRs at this point. This reduces the total number of MRs that the xprt has to create under heavy workloads and makes the marshaling logic less brittle. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Remove fr_stateChuck Lever1-10/+1
Now that both the Send and Receive completions are handled in process context, it is safe to DMA unmap and return MRs to the free or recycle lists directly in the completion handlers. Doing this means rpcrdma_frwr no longer needs to track the state of each MR, meaning that a VALID or FLUSHED MR can no longer appear on an xprt's MR free list. Thus there is no longer a need to track the MR's registration state in rpcrdma_frwr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flagChuck Lever1-1/+0
Commit 9590d083c1bb ("xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler") pins incoming RPC/RDMA replies so they can be left in the pending requests queue while they are being processed without introducing a race between ->buf_free and the transport's reply handler. Therefore RPCRDMA_REQ_F_PENDING is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Fix occasional transport deadlockChuck Lever1-6/+0
Under high I/O workloads, I've noticed that an RPC/RDMA transport occasionally deadlocks (IOPS goes to zero, and doesn't recover). Diagnosis shows that the sendctx queue is empty, but when sendctxs are returned to the queue, the xprt_write_space wake-up never occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy. I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented via an atomic bit. Just one of those is sufficient. Removing EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock un-reproducible. Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and is therefore removed. Unfortunately this patch does not apply cleanly to stable. If needed, someone will have to port it and test it. Fixes: 2fad659209d5 ("xprtrdma: Wait on empty sendctx queue") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Remove stale commentChuck Lever1-7/+0
The comment hasn't been accurate for several years. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Eliminate struct rpcrdma_create_data_internalChuck Lever1-17/+5
Clean up. Move the remaining field in rpcrdma_create_data_internal so the structure can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Aggregate the inline settings in struct rpcrdma_epChuck Lever1-4/+5
Clean up. The inline settings are actually a characteristic of the endpoint, and not related to the device. They are also modified after the transport instance is created, so they do not belong in the cdata structure either. Lastly, let's use names that are more natural to RDMA than to NFS: inline_write -> inline_send and inline_read -> inline_recv. The /proc files retain their names to avoid breaking user space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Remove rpcrdma_create_data_internal::rsize and wsizeChuck Lever1-2/+0
Clean up. xprt_rdma_max_inline_{read,write} cannot be set to large values by virtue of proc_dointvec_minmax. The current maximum is RPCRDMA_MAX_INLINE, which is much smaller than RPCRDMA_MAX_SEGS * PAGE_SIZE. The .rsize and .wsize fields are otherwise unused in the current code base, and thus can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Eliminate rpcrdma_ia::ri_deviceChuck Lever1-4/+3
Clean up. Since commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and Receive buffers"), a pointer to the device is now saved in each regbuf when it is DMA mapped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: More Send completion batchingChuck Lever1-10/+0
Instead of using a fixed number, allow the amount of Send completion batching to vary based on the client's maximum credit limit. - A larger default gives a small boost to IOPS throughput - Reducing it based on max_requests gives a safe result when the max credit limit is cranked down (eg. when the device has a small max_qp_wr). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Clean up sendctx functionsChuck Lever1-2/+3
Minor clean-ups I've stumbled on since sendctx was merged last year. In particular, making Send completion processing more efficient appears to have a measurable impact on IOPS throughput. Note: test_and_clear_bit() returns a value, thus an explicit memory barrier is not necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Increase maximum number of backchannel requestsChuck Lever1-3/+5
Reflects the change introduced in commit 067c46967160 ("NFSv4.1: Bump the default callback session slot count to 16"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Clean up regbuf helpersChuck Lever1-9/+18
For code legibility, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Also rpcrdma_regbuf_alloc and rpcrdma_regbuf_free no longer have any callers outside of verbs.c, and can thus be made static. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: De-duplicate "allocate new, free old regbuf"Chuck Lever1-0/+2
Clean up by providing an API to do this common task. At this point, the difference between rpcrdma_get_sendbuf and rpcrdma_get_recvbuf has become tiny. These can be collapsed into a single helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Allocate req's regbufs at xprt create timeChuck Lever1-1/+1
Allocating an rpcrdma_req's regbufs at xprt create time enables a pair of micro-optimizations: First, if these regbufs are always there, we can eliminate two conditional branches from the hot xprt_rdma_allocate path. Second, by allocating a 1KB buffer, it places a lower bound on the size of these buffers, without adding yet another conditional branch. The lower bound reduces the number of hardway re- allocations. In fact, for some workloads it completely eliminates hardway allocations. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: rpcrdma_regbuf alignmentChuck Lever1-9/+10
Allocate the struct rpcrdma_regbuf separately from the I/O buffer to better guarantee the alignment of the I/O buffer and eliminate the wasted space between the rpcrdma_regbuf metadata and the buffer itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Clean up rpcrdma_create_req()Chuck Lever1-1/+2
Eventually, I'd like to invoke rpcrdma_create_req() during the call_reserve step. Memory allocation there probably needs to use GFP_NOIO. Therefore a set of GFP flags needs to be passed in. As an additional clean up, just return a pointer or NULL, because the only error return code here is -ENOMEM. Lastly, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Reduce the doorbell rate (Receive)Chuck Lever1-0/+10
Post RECV WRs in batches to reduce the hardware doorbell rate per transport. This helps the RPC-over-RDMA client scale better in number of transports. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Fix sparse warningsChuck Lever1-1/+1
linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: got restricted __be32 [usertype] rq_xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: got restricted __be32 [usertype] rq_xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: got restricted __be32 [usertype] rq_xid Fixes: 0a93fbcb16e6 ("xprtrdma: Plant XID in on-the-wire RDMA ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-56/+24
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
2019-01-02xprtrdma: Remove unused fields from rpcrdma_iaChuck Lever1-2/+0
Clean up. The last use of these fields was in commit 173b8f49b3af ("xprtrdma: Demote "connect" log messages") . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Simplify locking that protects the rl_allreqs listChuck Lever1-4/+3
Clean up: There's little chance of contention between the use of rb_lock and rb_reqslock, so merge the two. This avoids having to take both in some (possibly future) cases. Transport tear-down is already serialized, thus there is no need for locking at all when destroying rpcrdma_reqs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)Chuck Lever1-1/+1
Place the associated RPC transaction's XID in the upper 32 bits of each RDMA segment's rdma_offset field. There are two reasons to do this: - The R_key only has 8 bits that are different from registration to registration. The XID adds more uniqueness to each RDMA segment to reduce the likelihood of a software bug on the server reading from or writing into memory it's not supposed to. - On-the-wire RDMA Read and Write requests do not otherwise carry any identifier that matches them up to an RPC. The XID in the upper 32 bits will act as an eye-catcher in network captures. Suggested-by: Tom Talpey <ttalpey@microsoft.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove rpcrdma_memreg_opsChuck Lever1-31/+17
Clean up: Now that there is only FRWR, there is no need for a memory registration switch. The indirect calls to the memreg operations can be replaced with faster direct calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove support for FMR memory registrationChuck Lever1-11/+1
FMR is not supported on most recent RDMA devices. It is also less secure than FRWR because an FMR memory registration can expose adjacent bytes to remote reading or writing. As discussed during the RDMA BoF at LPC 2018, it is time to remove support for FMR in the NFS/RDMA client stack. Note that NFS/RDMA server-side uses either local memory registration or FRWR. FMR is not used. There are a few Infiniband/RoCE devices in the kernel tree that do not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will not support client-side NFS/RDMA after this patch. These are: - mthca - qib - hns (RoCE) Users of these devices can use NFS/TCP on IPoIB instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Don't wake pending tasks until disconnect is doneChuck Lever1-0/+1
Transport disconnect processing does a "wake pending tasks" at various points. Suppose an RPC Reply is being processed. The RPC task that Reply goes with is waiting on the pending queue. If a disconnect wake-up happens before reply processing is done, that reply, even if it is good, is thrown away, and the RPC has to be sent again. This window apparently does not exist for socket transports because there is a lock held while a reply is being received which prevents the wake-up call until after reply processing is done. To resolve this, all RPC replies being processed on an RPC-over-RDMA transport have to complete before pending tasks are awoken due to a transport disconnect. Callers that already hold the transport write lock may invoke ->ops->close directly. Others use a generic helper that schedules a close when the write lock can be taken safely. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: No qp_event disconnectChuck Lever1-1/+0
After thinking about this more, and auditing other kernel ULP imple- mentations, I believe that a DISCONNECT cm_event will occur after a fatal QP event. If that's the case, there's no need for an explicit disconnect in the QP event handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueueChuck Lever1-5/+1
To address a connection-close ordering problem, we need the ability to drain the RPC completions running on rpcrdma_receive_wq for just one transport. Give each transport its own RPC completion workqueue, and drain that workqueue when disconnecting the transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Refactor Receive accountingChuck Lever1-2/+1
Clean up: Divide the work cleanly: - rpcrdma_wc_receive is responsible only for RDMA Receives - rpcrdma_reply_handler is responsible only for RPC Replies - the posted send and receive counts both belong in rpcrdma_ep Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-27sunrpc: remove unused bc_up operation from rpc_xprt_opsVasily Averin1-1/+0
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-03xprtrdma: Simplify RPC wake-ups on connectChuck Lever1-3/+1
Currently, when a connection is established, rpcrdma_conn_upcall invokes rpcrdma_conn_func and then wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs, but the connect worker is not done yet, and that leads to races, double wakes, and difficulty understanding how this logic is supposed to work. Instead, collect all the "connection established" logic in the connect worker (xprt_rdma_connect_worker). A disconnect worker is retained to handle provider upcalls safely. Fixes: 254f91e2fa1f ("xprtrdma: RPC/RDMA must invoke ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Explicitly resetting MRs is no longer necessaryChuck Lever1-6/+8
When a memory operation fails, the MR's driver state might not match its hardware state. The only reliable recourse is to dereg the MR. This is done in ->ro_recover_mr, which then attempts to allocate a fresh MR to replace the released MR. Since commit e2ac236c0b651 ("xprtrdma: Allocate MRs on demand"), xprtrdma dynamically allocates MRs. It can add more MRs whenever they are needed. That makes it possible to simply release an MR when a memory operation fails, instead of "recovering" it. It will automatically be replaced by the on-demand MR allocator. This commit is a little larger than I wanted, but it replaces ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the rb_stale_mrs list with a generic work queue. Since MRs are no longer orphaned, the mrs_orphaned metric is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-12Merge tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-14/+12
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message - Fix a hang due to incorrect error returns in rpcrdma_convert_iovs() - Revert an incorrect change to the NFSv4.1 callback channel - Fix a bug in the NFSv4.1 sequence error handling Features and optimisations: - Support for piggybacking a LAYOUTGET operation to the OPEN compound - RDMA performance enhancements to deal with transport congestion - Add proper SPDX tags for NetApp-contributed RDMA source - Do not request delegated file attributes (size+change) from the server - Optimise away a GETATTR in the lookup revalidate code when doing NFSv4 OPEN - Optimise away unnecessary lookups for rename targets - Misc performance improvements when freeing NFSv4 delegations Bugfixes and cleanups: - Try to fail quickly if proto=rdma - Clean up RDMA receive trace points - Fix sillyrename to return the delegation when appropriate - Misc attribute revalidation fixes - Immediately clear the pNFS layout on a file when the server returns ESTALE - Return NFS4ERR_DELAY when delegation/layout recalls fail due to igrab() - Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY" * tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (80 commits) skip LAYOUTRETURN if layout is invalid NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY NFSv4: Fix a typo in nfs41_sequence_process NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..") NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab() NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab() NFSv4.0: Remove transport protocol name from non-UCS client ID NFSv4.0: Remove cl_ipaddr from non-UCS client ID NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined NFS: Filter cache invalidation when holding a delegation NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes() NFS: Improve caching while holding a delegation NFS: Fix attribute revalidation NFS: fix up nfs_setattr_update_inode NFSv4: Ensure the inode is clean when we set a delegation NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access NFSv4: Don't ask for delegated attributes when adding a hard link NFSv4: Don't ask for delegated attributes when revalidating the inode NFS: Pass the inode down to the getattr() callback NFSv4: Don't request size+change attribute if they are delegated to us ...
2018-06-12Merge tag 'nfsd-4.18' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-2/+0
Pull nfsd updates from Bruce Fields: "A relatively quiet cycle for nfsd. The largest piece is an RDMA update from Chuck Lever with new trace points, miscellaneous cleanups, and streamlining of the send and receive paths. Other than that, some miscellaneous bugfixes" * tag 'nfsd-4.18' of git://linux-nfs.org/~bfields/linux: (26 commits) nfsd: fix error handling in nfs4_set_delegation() nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo Fix 16-byte memory leak in gssp_accept_sec_context_upcall svcrdma: Fix incorrect return value/type in svc_rdma_post_recvs svcrdma: Remove unused svc_rdma_op_ctxt svcrdma: Persistently allocate and DMA-map Send buffers svcrdma: Simplify svc_rdma_send() svcrdma: Remove post_send_wr svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt svcrdma: Introduce svc_rdma_send_ctxt svcrdma: Clean up Send SGE accounting svcrdma: Refactor svc_rdma_dma_map_buf svcrdma: Allocate recv_ctxt's on CPU handling Receives svcrdma: Persistently allocate and DMA-map Receive buffers svcrdma: Preserve Receive buffer until svc_rdma_sendto svcrdma: Simplify svc_rdma_recv_ctxt_put svcrdma: Remove sc_rq_depth svcrdma: Introduce svc_rdma_recv_ctxt svcrdma: Trace key RDMA API events svcrdma: Trace key RPC/RDMA protocol events ...
2018-06-04Merge tag 'nfs-rdma-for-4.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfsTrond Myklebust1-14/+12
NFS-over-RDMA client updates for Linux 4.18 Stable patches: - xprtrdma: Return -ENOBUFS when no pages are available New features: - Add ->alloc_slot() and ->free_slot() functions Bugfixes and cleanups: - Add missing SPDX tags to some files - Try to fail mount quickly if client has no RDMA devices - Create transport IDs in the correct network namespace - Fix max_send_wr computation - Clean up receive tracepoints - Refactor receive handling - Remove unused functions
2018-06-01xprtrdma: Wait on empty sendctx queueChuck Lever1-0/+6
Currently, when the sendctx queue is exhausted during marshaling, the RPC/RDMA transport places the RPC task on the delayq, which forces a wait for HZ >> 2 before the marshal and send is retried. With this change, the transport now places such an RPC task on the pending queue, and wakes it just as soon as more sendctxs become available. This typically takes less than a millisecond, and the write_space waking mechanism is less deadlock-prone. Moreover, the waiting RPC task is holding the transport's write lock, which blocks the transport from sending RPCs. Therefore faster recovery from sendctx queue exhaustion is desirable. Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN when there are no free MRs"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-11xprtrdma: Prepare RPC/RDMA includes for server-side trace pointsChuck Lever1-2/+0
Clean up: Move #include <trace/events/rpcrdma.h> into source files, similar to how it is done with trace/events/sunrpc.h. Server-side trace points will be part of the rpcrdma subsystem, just like the client-side trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-07xprtrdma: Make rpcrdma_sendctx_put_locked() a static functionChuck Lever1-1/+0
Clean up: The only call site is in the same file as the function's definition. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07xprtrdma: Remove rpcrdma_ep_{post_recv, post_extra_recv}Chuck Lever1-3/+0
Clean up: These functions are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07xprtrdma: Move Receive posting to Receive handlerChuck Lever1-3/+3
Receive completion and Reply handling are done by a BOUND workqueue, meaning they run on only one CPU. Posting receives is currently done in the send_request path, which on large systems is typically done on a different CPU than the one handling Receive completions. This results in movement of Receive-related cachelines between the sending and receiving CPUs. More importantly, it means that currently Receives are posted while the transport's write lock is held, which is unnecessary and costly. Finally, allocation of Receive buffers is performed on-demand in the Receive completion handler. This helps guarantee that they are allocated on the same NUMA node as the CPU that handles Receive completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07xprtrdma: Make rpc_rqst part of rpcrdma_reqChuck Lever1-7/+2
This simplifies allocation of the generic RPC slot and xprtrdma specific per-RPC resources. It also makes xprtrdma more like the socket-based transports: ->buf_alloc and ->buf_free are now responsible only for send and receive buffers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07xprtrdma: Add proper SPDX tags for NetApp-contributed sourceChuck Lever1-0/+1
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-01xprtrdma: Fix list corruption / DMAR errors during MR recoveryChuck Lever1-1/+1
The ro_release_mr methods check whether mr->mr_list is empty. Therefore, be sure to always use list_del_init when removing an MR linked into a list using that field. Otherwise, when recovering from transport failures or device removal, list corruption can result, or MRs can get mapped or unmapped an odd number of times, resulting in IOMMU-related failures. In general this fix is appropriate back to v4.8. However, code changes since then make it impossible to apply this patch directly to stable kernels. The fix would have to be applied by hand or reworked for kernels earlier than v4.16. Backport guidance -- there are several cases: - When creating an MR, initialize mr_list so that using list_empty on an as-yet-unused MR is safe. - When an MR is being handled by the remote invalidation path, ensure that mr_list is reinitialized when it is removed from rl_registered. - When an MR is being handled by rpcrdma_destroy_mrs, it is removed from mr_all, but it may still be on an rl_registered list. In that case, the MR needs to be removed from that list before being released. - Other cases are covered by using list_del_init in rpcrdma_mr_pop. Fixes: 9d6b04097882 ('xprtrdma: Place registered MWs on a ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10xprtrdma: Chain Send to FastReg WRsChuck Lever1-0/+2
With FRWR, the client transport can perform memory registration and post a Send with just a single ib_post_send. This reduces contention between the send_request path and the Send Completion handlers, and reduces the overhead of registering a chunk that has multiple segments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10xprtrdma: Remove xprt-specific connect cookieChuck Lever1-1/+0
Clean up: The generic rq_connect_cookie is sufficient to detect RPC Call retransmission. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>