aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/net/sunrpc/xprtrdma/rpc_rdma.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-04-20xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}Chuck Lever1-4/+11
These new helpers do not return 0 on success, they return the encoded size. Thus they are not a drop-in replacement for the old helpers. Fixes: 5c266df52701 ("SUNRPC: Add encoders for list item discriminators") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-04-07Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-16/+16
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page leak in nfs_destroy_unlinked_subrequests() - Fix use-after-free issues in nfs_pageio_add_request() - Fix new mount code constant_table array definitions - finish_automount() requires us to hold 2 refs to the mount record Features: - Improve the accuracy of telldir/seekdir by using 64-bit cookies when possible. - Allow one RDMA active connection and several zombie connections to prevent blocking if the remote server is unresponsive. - Limit the size of the NFS access cache by default - Reduce the number of references to credentials that are taken by NFS - pNFS files and flexfiles drivers now support per-layout segment COMMIT lists. - Enable partial-file layout segments in the pNFS/flexfiles driver. - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from the DS using the layouterror mechanism. Bugfixes and cleanups: - SUNRPC: Fix krb5p regressions - Don't specify NFS version in "UDP not supported" error - nfsroot: set tcp as the default transport protocol - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid() - alloc_nfs_open_context() must use the file cred when available - Fix locking when dereferencing the delegation cred - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails - Various clean ups of the NFS O_DIRECT commit code - Clean up RDMA connect/disconnect - Replace zero-length arrays with C99-style flexible arrays" * tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits) NFS: Clean up process of marking inode stale. SUNRPC: Don't start a timer on an already queued rpc task NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn() NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode() NFS: Beware when dereferencing the delegation cred NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout NFS: finish_automount() requires us to hold 2 refs to the mount record NFS: Fix a few constant_table array definitions NFS: Try to join page groups before an O_DIRECT retransmission NFS: Refactor nfs_lock_and_join_requests() NFS: Reverse the submission order of requests in __nfs_pageio_add_request() NFS: Clean up nfs_lock_and_join_requests() NFS: Remove the redundant function nfs_pgio_has_mirroring() NFS: Fix memory leaks in nfs_pageio_stop_mirroring() NFS: Fix a request reference leak in nfs_direct_write_clear_reqs() NFS: Fix use-after-free issues in nfs_pageio_add_request() NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() NFS: Fix a page leak in nfs_destroy_unlinked_subrequests() NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio() pNFS/flexfiles: Specify the layout segment range in LAYOUTGET ...
2020-03-27xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprtChuck Lever1-8/+9
Change the rpcrdma_xprt_disconnect() function so that it no longer waits for the DISCONNECTED event. This prevents blocking if the remote is unresponsive. In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is detached. Upon return from rpcrdma_xprt_disconnect(), the transport (r_xprt) is ready immediately for a new connection. The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now handled almost identically. However, because the lifetimes of rpcrdma_xprt structures and rpcrdma_ep structures are now independent, creating an rpcrdma_ep needs to take a module ref count. The ep now owns most of the hardware resources for a transport. Also, a kref is needed to ensure that rpcrdma_ep sticks around long enough for the cm_event_handler to finish. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_epChuck Lever1-16/+15
I eventually want to allocate rpcrdma_ep separately from struct rpcrdma_xprt so that on occasion there can be more than one ep per xprt. The new struct rpcrdma_ep will contain all the fields currently in rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings for the connection, in addition to per-connection settings negotiated with the remote. Take this opportunity to rename the existing ep fields from rep_* to re_* to disambiguate these from struct rpcrdma_rep. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-16SUNRPC: Add encoders for list item discriminatorsChuck Lever1-31/+5
Clean up. These are taken from the client-side RPC/RDMA transport to a more global header file so they can be used elsewhere. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-01-15xprtrdma: Allocate and map transport header buffers at connect timeChuck Lever1-7/+3
Currently the underlying RDMA device is chosen at transport set-up time. But it will soon be at connect time instead. The maximum size of a transport header is based on device capabilities. Thus transport header buffers have to be allocated _after_ the underlying device has been chosen (via address and route resolution); ie, in the connect worker. Thus, move the allocation of transport header buffers to the connect worker, after the point at which the underlying RDMA device has been chosen. This also means the RDMA device is available to do a DMA mapping of these buffers at connect time, instead of in the hot I/O path. Make that optimization as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate per-transport "max pages"Chuck Lever1-1/+1
To support device hotplug and migrating a connection between devices of different capabilities, we have to guarantee that all in-kernel devices can support the same max NFS payload size (1 megabyte). This means that possibly one or two in-tree devices are no longer supported for NFS/RDMA because they cannot support 1MB rsize/wsize. The only one I confirmed was cxgb3, but it has already been removed from the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Refactor initialization of ep->rep_max_requestsChuck Lever1-3/+3
Clean up: there is no need to keep two copies of the same value. Also, in subsequent patches, rpcrdma_ep_create() will be called in the connect worker rather than at set-up time. Minor fix: Initialize the transport's sendctx to the value based on the capabilities of the underlying device, not the maximum setting. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15xprtrdma: Eliminate ri_max_send_sgesChuck Lever1-1/+1
Clean-up. The max_send_sge value also happens to be stored in ep->rep_attr. Let's keep just a single copy. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Replace dprintk() in rpcrdma_update_connect_private()Chuck Lever1-4/+0
Clean up: Use a single trace point to record each connection's negotiated inline thresholds and the computed maximum byte size of transport headers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Refine trace_xprtrdma_fixupChuck Lever1-3/+2
Slightly reduce overhead and display more useful information. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Pull up sometimesChuck Lever1-5/+77
On some platforms, DMA mapping part of a page is more costly than copying bytes. Restore the pull-up code and use that when we think it's going to be faster. The heuristic for now is to pull-up when the size of the RPC message body fits in the buffer underlying the head iovec. Indeed, not involving the I/O MMU can help the RPC/RDMA transport scale better for tiny I/Os across more RDMA devices. This is because interaction with the I/O MMU is eliminated, as is handling a Send completion, for each of these small I/Os. Without the explicit unmapping, the NIC no longer needs to do a costly internal TLB shoot down for buffers that are just a handful of bytes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Refactor rpcrdma_prepare_msg_sges()Chuck Lever1-117/+146
Refactor: Replace spaghetti with code that makes it plain what needs to be done for each rtype. This makes it easier to add features and optimizations. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Move the rpcrdma_sendctx::sc_wr fieldChuck Lever1-3/+6
Clean up: This field is not needed in the Send completion handler, so it can be moved to struct rpcrdma_req to reduce the size of struct rpcrdma_sendctx, and to reduce the amount of memory that is sloshed between the sending process and the Send completion process. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Remove rpcrdma_sendctx::sc_deviceChuck Lever1-2/+2
Micro-optimization: Save eight bytes in a frequently allocated structure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Manage MRs in context of a single connectionChuck Lever1-8/+1
MRs are now allocated on demand so we can safely throw them away on disconnect. This way an idle transport can disconnect and it won't pin hardware MR resources. Two additional changes: - Now that all MRs are destroyed on disconnect, there's no need to check during header marshaling if a req has MRs to recycle. Each req is sent only once per connection, and now rl_registered is guaranteed to be empty when rpcrdma_marshal_req is invoked. - Because MRs are now destroyed in a WQ_MEM_RECLAIM context, they also must be allocated in a WQ_MEM_RECLAIM context. This reduces the likelihood that device driver memory allocation will trigger memory reclaim during NFS writeback. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Close window between waking RPC senders and posting ReceivesChuck Lever1-0/+1
A recent clean up attempted to separate Receive handling and RPC Reply processing, in the name of clean layering. Unfortunately, we can't do this because the Receive Queue has to be refilled _after_ the most recent credit update from the responder is parsed from the transport header, but _before_ we wake up the next RPC sender. That is right in the middle of rpcrdma_reply_handler(). Usually this isn't a problem because current responder implementations don't vary their credit grant. The one exception is when a connection is established: the grant goes from one to a much larger number on the first Receive. The requester MUST post enough Receives right then so that any outstanding requests can be sent without risking RNR and connection loss. Fixes: 6ceea36890a0 ("xprtrdma: Refactor Receive accounting") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24xprtrdma: Initialize rb_credits in one placeChuck Lever1-6/+36
Clean up/code de-duplication. Nit: RPC_CWNDSHIFT is incorrect as the initial value for xprt->cwnd. This mistake does not appear to have operational consequences, since the cwnd value is replaced with a valid value upon the first Receive completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26xprtrdma: Clear xprt->reestablish_timeout on closeChuck Lever1-2/+6
Ensure that the re-establishment delay does not grow exponentially on each good reconnect. This probably should have been part of commit 675dd90ad093 ("xprtrdma: Modernize ops->connect"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26xprtrdma: Recycle MRs after disconnectChuck Lever1-1/+1
The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a bit too optimistic. MRs left over after a reconnect still need to be recycled, not added back to the free list, since they could be in flight or actually fully registered. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Inline XDR chunk encoder functionsChuck Lever1-9/+12
Micro-optimization: Save the cost of three function calls during transport header encoding. These were "noinline" before to generate more meaningful call stacks during debugging, but this code is now pretty stable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Cache free MRs in each rpcrdma_reqChuck Lever1-3/+8
Instead of a globally-contended MR free list, cache MRs in each rpcrdma_req as they are released. This means acquiring and releasing an MR will be lock-free in the common case, even outside the transport send lock. The original idea of per-rpcrdma_req MR free lists was suggested by Shirley Ma <shirley.ma@oracle.com> several years ago. I just now figured out how to make that idea work with on-demand MR allocation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Move rpcrdma_mr_get out of frwr_mapChuck Lever1-6/+24
Refactor: Retrieve an MR and handle error recovery entirely in rpc_rdma.c, as this is not a device-specific function. Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue"), the xprt_transmit function handles the cond_resched. The transport no longer has to do this itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Simplify rpcrdma_mr_popChuck Lever1-6/+1
Clean up: rpcrdma_mr_pop call sites check if the list is empty first. Let's replace the list_empty with less costly logic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-12Merge tag 'nfs-rdma-for-5.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfsTrond Myklebust1-85/+63
NFSoRDMA client updates for 5.3 New features: - Add a way to place MRs back on the free list - Reduce context switching - Add new trace events Bugfixes and cleanups: - Fix a BUG when tracing is enabled with NFSv4.1 - Fix a use-after-free in rpcrdma_post_recvs - Replace use of xdr_stream_pos in rpcrdma_marshal_req - Fix occasional transport deadlock - Fix show_nfs_errors macros, other tracing improvements - Remove RPCRDMA_REQ_F_PENDING and fr_state - Various simplifications and refactors
2019-07-09xprtrdma: Refactor chunk encodingChuck Lever1-20/+16
Clean up. Move the "not present" case into the individual chunk encoders. This improves code organization and readability. The reason for the original organization was to optimize for the case where there there are no chunks. The optimization turned out to be inconsequential, so let's err on the side of code readability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Wake RPCs directly in rpcrdma_wc_send pathChuck Lever1-38/+23
Eliminate a context switch in the path that handles RPC wake-ups when a Receive completion has to wait for a Send completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Reduce context switching due to Local InvalidationChuck Lever1-31/+30
Since commit ba69cd122ece ("xprtrdma: Remove support for FMR memory registration"), FRWR is the only supported memory registration mode. We can take advantage of the asynchronous nature of FRWR's LOCAL_INV Work Requests to get rid of the completion wait by having the LOCAL_INV completion handler take care of DMA unmapping MRs and waking the upper layer RPC waiter. This eliminates two context switches when local invalidation is necessary. As a side benefit, we will no longer need the per-xprt deferred completion work queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Add mechanism to place MRs back on the free listChuck Lever1-0/+1
When a marshal operation fails, any MRs that were already set up for that request are recycled. Recycling releases MRs and creates new ones, which is expensive. Since commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was merged, recycling FRWRs is unnecessary. This is because before that commit, frwr_map had already posted FAST_REG Work Requests, so ownership of the MRs had already been passed to the NIC and thus dealing with them had to be delayed until they completed. Since that commit, however, FAST_REG WRs are posted at the same time as the Send WR. This means that if marshaling fails, we are certain the MRs are safe to simply unmap and place back on the free list because neither the Send nor the FAST_REG WRs have been posted yet. The kernel still has ownership of the MRs at this point. This reduces the total number of MRs that the xprt has to create under heavy workloads and makes the marshaling logic less brittle. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Remove fr_stateChuck Lever1-1/+1
Now that both the Send and Receive completions are handled in process context, it is safe to DMA unmap and return MRs to the free or recycle lists directly in the completion handlers. Doing this means rpcrdma_frwr no longer needs to track the state of each MR, meaning that a VALID or FLUSHED MR can no longer appear on an xprt's MR free list. Thus there is no longer a need to track the MR's registration state in rpcrdma_frwr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flagChuck Lever1-1/+0
Commit 9590d083c1bb ("xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler") pins incoming RPC/RDMA replies so they can be left in the pending requests queue while they are being processed without introducing a race between ->buf_free and the transport's reply handler. Therefore RPCRDMA_REQ_F_PENDING is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Fix occasional transport deadlockChuck Lever1-14/+12
Under high I/O workloads, I've noticed that an RPC/RDMA transport occasionally deadlocks (IOPS goes to zero, and doesn't recover). Diagnosis shows that the sendctx queue is empty, but when sendctxs are returned to the queue, the xprt_write_space wake-up never occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy. I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented via an atomic bit. Just one of those is sufficient. Removing EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock un-reproducible. Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and is therefore removed. Unfortunately this patch does not apply cleanly to stable. If needed, someone will have to port it and test it. Fixes: 2fad659209d5 ("xprtrdma: Wait on empty sendctx queue") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09xprtrdma: Replace use of xdr_stream_pos in rpcrdma_marshal_reqChuck Lever1-3/+3
This is a latent bug. xdr_stream_pos works by subtracting xdr_stream::nwords from xdr_buf::len. But xdr_stream::nwords is not initialized by xdr_init_encode(). It works today only because all fields in rpcrdma_req::rl_stream are initialized to zero by rpcrdma_req_create, making the subtraction in xdr_stream_pos always a no-op. I found this issue via code inspection. It was introduced by commit 39f4cd9e9982 ("xprtrdma: Harden chunk list encoding against send buffer overflow"), but the code has changed enough since then that this fix can't be automatically applied to stable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-06SUNRPC: Remove the bh-safe lock requirement on xprt->transport_lockTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-25xprtrdma: Aggregate the inline settings in struct rpcrdma_epChuck Lever1-15/+19
Clean up. The inline settings are actually a characteristic of the endpoint, and not related to the device. They are also modified after the transport instance is created, so they do not belong in the cdata structure either. Lastly, let's use names that are more natural to RDMA than to NFS: inline_write -> inline_send and inline_read -> inline_recv. The /proc files retain their names to avoid breaking user space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Clean up sendctx functionsChuck Lever1-15/+12
Minor clean-ups I've stumbled on since sendctx was merged last year. In particular, making Send completion processing more efficient appears to have a measurable impact on IOPS throughput. Note: test_and_clear_bit() returns a value, thus an explicit memory barrier is not necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Trace marshaling failuresChuck Lever1-0/+1
Record an event when rpcrdma_marshal_req returns a non-zero return value to help track down why an xprt close might have occurred. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: Clean up regbuf helpersChuck Lever1-24/+23
For code legibility, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Also rpcrdma_regbuf_alloc and rpcrdma_regbuf_free no longer have any callers outside of verbs.c, and can thus be made static. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25xprtrdma: rpcrdma_regbuf alignmentChuck Lever1-2/+2
Allocate the struct rpcrdma_regbuf separately from the I/O buffer to better guarantee the alignment of the I/O buffer and eliminate the wasted space between the rpcrdma_regbuf metadata and the buffer itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25SUNRPC: Avoid digging into the ATOMIC poolChuck Lever1-1/+1
Page allocation requests made when the SPARSE_PAGES flag is set are allowed to fail, and are not critical. No need to spend a rare resource. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Add xdr_stream::rqst fieldChuck Lever1-2/+2
Having access to the controlling rpc_rqst means a trace point in the XDR code can report: - the XID - the task ID and client ID - the p_name of RPC being processed Subsequent patches will introduce such trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Check inline size before providing a Write chunkChuck Lever1-1/+17
In very rare cases, an NFS READ operation might predict that the non-payload part of the RPC Call is large. For instance, an NFSv4 COMPOUND with a large GETATTR result, in combination with a large Kerberos credential, could push the non-payload part to be several kilobytes. If the non-payload part is larger than the connection's inline threshold, the client is required to provision a Reply chunk. The current Linux client does not check for this case. There are two obvious ways to handle it: a. Provision a Write chunk for the payload and a Reply chunk for the non-payload part b. Provision a Reply chunk for the whole RPC Reply Some testing at a recent NFS bake-a-thon showed that servers can mostly handle a. but there are some corner cases that do not work yet. b. already works (it has to, to handle krb5i/p), but could be somewhat less efficient. However, I expect this scenario to be very rare -- no-one has reported a problem yet. So I'm going to implement b. Sometime later I will provide some patches to help make b. a little more efficient by more carefully choosing the Reply chunk's segment sizes to ensure the payload is optimally aligned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever1-0/+4
If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Trace mapping, alloc, and dereg failuresChuck Lever1-1/+1
These are rare, but can be helpful at tracking down DMAR and other problems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Clean up of xprtrdma chunk trace pointsChuck Lever1-3/+3
The chunk-related trace points capture nearly the same information as the MR-related trace points. Also, rename them so globbing can be used to enable or disable these trace points more easily. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Cull dprintk() call sitesChuck Lever1-7/+10
Clean up: Remove dprintk() call sites that report rare or impossible errors. Leave a few that display high-value low noise status information. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Expose transport header errorsChuck Lever1-1/+0
For better observability of parsing errors, return the error code generated in the decoders to the upper layer consumer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Recognize XDRBUF_SPARSE_PAGESChuck Lever1-5/+6
Commit 431f6eb3570f ("SUNRPC: Add a label for RPC calls that require allocation on receive") didn't update similar logic in rpc_rdma.c. I don't think this is a bug, per-se; the commit just adds more careful checking for broken upper layer behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)Chuck Lever1-3/+3
Place the associated RPC transaction's XID in the upper 32 bits of each RDMA segment's rdma_offset field. There are two reasons to do this: - The R_key only has 8 bits that are different from registration to registration. The XID adds more uniqueness to each RDMA segment to reduce the likelihood of a software bug on the server reading from or writing into memory it's not supposed to. - On-the-wire RDMA Read and Write requests do not otherwise carry any identifier that matches them up to an RPC. The XID in the upper 32 bits will act as an eye-catcher in network captures. Suggested-by: Tom Talpey <ttalpey@microsoft.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove rpcrdma_memreg_opsChuck Lever1-9/+5
Clean up: Now that there is only FRWR, there is no need for a memory registration switch. The indirect calls to the memreg operations can be replaced with faster direct calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>