diff options
Diffstat (limited to 'Documentation/filesystems/nfs')
-rw-r--r-- | Documentation/filesystems/nfs/client-identifier.rst | 216 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/exporting.rst | 52 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/index.rst | 16 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/knfsd-stats.rst (renamed from Documentation/filesystems/nfs/knfsd-stats.txt) | 17 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.rst | 256 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 173 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/pnfs.rst (renamed from Documentation/filesystems/nfs/pnfs.txt) | 25 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/reexport.rst | 113 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/rpc-cache.rst (renamed from Documentation/filesystems/nfs/rpc-cache.txt) | 136 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/rpc-server-gss.rst (renamed from Documentation/filesystems/nfs/rpc-server-gss.txt) | 28 |
10 files changed, 768 insertions, 264 deletions
diff --git a/Documentation/filesystems/nfs/client-identifier.rst b/Documentation/filesystems/nfs/client-identifier.rst new file mode 100644 index 000000000000..5147e15815a1 --- /dev/null +++ b/Documentation/filesystems/nfs/client-identifier.rst @@ -0,0 +1,216 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +NFSv4 client identifier +======================= + +This document explains how the NFSv4 protocol identifies client +instances in order to maintain file open and lock state during +system restarts. A special identifier and principal are maintained +on each client. These can be set by administrators, scripts +provided by site administrators, or tools provided by Linux +distributors. + +There are risks if a client's NFSv4 identifier and its principal +are not chosen carefully. + + +Introduction +------------ + +The NFSv4 protocol uses "lease-based file locking". Leases help +NFSv4 servers provide file lock guarantees and manage their +resources. + +Simply put, an NFSv4 server creates a lease for each NFSv4 client. +The server collects each client's file open and lock state under +the lease for that client. + +The client is responsible for periodically renewing its leases. +While a lease remains valid, the server holding that lease +guarantees the file locks the client has created remain in place. + +If a client stops renewing its lease (for example, if it crashes), +the NFSv4 protocol allows the server to remove the client's open +and lock state after a certain period of time. When a client +restarts, it indicates to servers that open and lock state +associated with its previous leases is no longer valid and can be +destroyed immediately. + +In addition, each NFSv4 server manages a persistent list of client +leases. When the server restarts and clients attempt to recover +their state, the server uses this list to distinguish amongst +clients that held state before the server restarted and clients +sending fresh OPEN and LOCK requests. This enables file locks to +persist safely across server restarts. + +NFSv4 client identifiers +------------------------ + +Each NFSv4 client presents an identifier to NFSv4 servers so that +they can associate the client with its lease. Each client's +identifier consists of two elements: + + - co_ownerid: An arbitrary but fixed string. + + - boot verifier: A 64-bit incarnation verifier that enables a + server to distinguish successive boot epochs of the same client. + +The NFSv4.0 specification refers to these two items as an +"nfs_client_id4". The NFSv4.1 specification refers to these two +items as a "client_owner4". + +NFSv4 servers tie this identifier to the principal and security +flavor that the client used when presenting it. Servers use this +principal to authorize subsequent lease modification operations +sent by the client. Effectively this principal is a third element of +the identifier. + +As part of the identity presented to servers, a good +"co_ownerid" string has several important properties: + + - The "co_ownerid" string identifies the client during reboot + recovery, therefore the string is persistent across client + reboots. + - The "co_ownerid" string helps servers distinguish the client + from others, therefore the string is globally unique. Note + that there is no central authority that assigns "co_ownerid" + strings. + - Because it often appears on the network in the clear, the + "co_ownerid" string does not reveal private information about + the client itself. + - The content of the "co_ownerid" string is set and unchanging + before the client attempts NFSv4 mounts after a restart. + - The NFSv4 protocol places a 1024-byte limit on the size of the + "co_ownerid" string. + +Protecting NFSv4 lease state +---------------------------- + +NFSv4 servers utilize the "client_owner4" as described above to +assign a unique lease to each client. Under this scheme, there are +circumstances where clients can interfere with each other. This is +referred to as "lease stealing". + +If distinct clients present the same "co_ownerid" string and use +the same principal (for example, AUTH_SYS and UID 0), a server is +unable to tell that the clients are not the same. Each distinct +client presents a different boot verifier, so it appears to the +server as if there is one client that is rebooting frequently. +Neither client can maintain open or lock state in this scenario. + +If distinct clients present the same "co_ownerid" string and use +distinct principals, the server is likely to allow the first client +to operate normally but reject subsequent clients with the same +"co_ownerid" string. + +If a client's "co_ownerid" string or principal are not stable, +state recovery after a server or client reboot is not guaranteed. +If a client unexpectedly restarts but presents a different +"co_ownerid" string or principal to the server, the server orphans +the client's previous open and lock state. This blocks access to +locked files until the server removes the orphaned state. + +If the server restarts and a client presents a changed "co_ownerid" +string or principal to the server, the server will not allow the +client to reclaim its open and lock state, and may give those locks +to other clients in the meantime. This is referred to as "lock +stealing". + +Lease stealing and lock stealing increase the potential for denial +of service and in rare cases even data corruption. + +Selecting an appropriate client identifier +------------------------------------------ + +By default, the Linux NFSv4 client implementation constructs its +"co_ownerid" string starting with the words "Linux NFS" followed by +the client's UTS node name (the same node name, incidentally, that +is used as the "machine name" in an AUTH_SYS credential). In small +deployments, this construction is usually adequate. Often, however, +the node name by itself is not adequately unique, and can change +unexpectedly. Problematic situations include: + + - NFS-root (diskless) clients, where the local DCHP server (or + equivalent) does not provide a unique host name. + + - "Containers" within a single Linux host. If each container has + a separate network namespace, but does not use the UTS namespace + to provide a unique host name, then there can be multiple NFS + client instances with the same host name. + + - Clients across multiple administrative domains that access a + common NFS server. If hostnames are not assigned centrally + then uniqueness cannot be guaranteed unless a domain name is + included in the hostname. + +Linux provides two mechanisms to add uniqueness to its "co_ownerid" +string: + + nfs.nfs4_unique_id + This module parameter can set an arbitrary uniquifier string + via the kernel command line, or when the "nfs" module is + loaded. + + /sys/fs/nfs/client/net/identifier + This virtual file, available since Linux 5.3, is local to the + network namespace in which it is accessed and so can provide + distinction between network namespaces (containers) when the + hostname remains uniform. + +Note that this file is empty on name-space creation. If the +container system has access to some sort of per-container identity +then that uniquifier can be used. For example, a uniquifier might +be formed at boot using the container's internal identifier: + + sha256sum /etc/machine-id | awk '{print $1}' \\ + > /sys/fs/nfs/client/net/identifier + +Security considerations +----------------------- + +The use of cryptographic security for lease management operations +is strongly encouraged. + +If NFS with Kerberos is not configured, a Linux NFSv4 client uses +AUTH_SYS and UID 0 as the principal part of its client identity. +This configuration is not only insecure, it increases the risk of +lease and lock stealing. However, it might be the only choice for +client configurations that have no local persistent storage. +"co_ownerid" string uniqueness and persistence is critical in this +case. + +When a Kerberos keytab is present on a Linux NFS client, the client +attempts to use one of the principals in that keytab when +identifying itself to servers. The "sec=" mount option does not +control this behavior. Alternately, a single-user client with a +Kerberos principal can use that principal in place of the client's +host principal. + +Using Kerberos for this purpose enables the client and server to +use the same lease for operations covered by all "sec=" settings. +Additionally, the Linux NFS client uses the RPCSEC_GSS security +flavor with Kerberos and the integrity QOS to prevent in-transit +modification of lease modification requests. + +Additional notes +---------------- +The Linux NFSv4 client establishes a single lease on each NFSv4 +server it accesses. NFSv4 mounts from a Linux NFSv4 client of a +particular server then share that lease. + +Once a client establishes open and lock state, the NFSv4 protocol +enables lease state to transition to other servers, following data +that has been migrated. This hides data migration completely from +running applications. The Linux NFSv4 client facilitates state +migration by presenting the same "client_owner4" to all servers it +encounters. + +======== +See Also +======== + + - nfs(5) + - kerberos(7) + - RFC 7530 for the NFSv4.0 specification + - RFC 8881 for the NFSv4.1 specification. diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 33d588a01ace..0e98edd353b5 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,6 +154,11 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match. + flags + Some filesystems may need to be handled differently than others. The + export_operations struct also includes a flags field that allows the + filesystem to communicate such information to nfsd. See the Export + Operations Flags section below for more explanation. A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -163,3 +168,50 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. + +Export Operations Flags +----------------------- +In addition to the operation vector pointers, struct export_operations also +contains a "flags" field that allows the filesystem to communicate to nfsd +that it may want to do things differently when dealing with it. The +following flags are defined: + + EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem + RFC 1813 recommends that servers always send weak cache consistency + (WCC) data to the client after each operation. The server should + atomically collect attributes about the inode, do an operation on it, + and then collect the attributes afterward. This allows the client to + skip issuing GETATTRs in some situations but means that the server + is calling vfs_getattr for almost all RPCs. On some filesystems + (particularly those that are clustered or networked) this is expensive + and atomicity is difficult to guarantee. This flag indicates to nfsd + that it should skip providing WCC attributes to the client in NFSv3 + replies when doing operations on this filesystem. Consider enabling + this on filesystems that have an expensive ->getattr inode operation, + or when atomicity between pre and post operation attribute collection + is impossible to guarantee. + + EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs + Many NFS operations deal with filehandles, which the server must then + vet to ensure that they live inside of an exported tree. When the + export consists of an entire filesystem, this is trivial. nfsd can just + ensure that the filehandle live on the filesystem. When only part of a + filesystem is exported however, then nfsd must walk the ancestors of the + inode to ensure that it's within an exported subtree. This is an + expensive operation and not all filesystems can support it properly. + This flag exempts the filesystem from subtree checking and causes + exportfs to get back an error if it tries to enable subtree checking + on it. + + EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking + On some exportable filesystems (such as NFS) unlinking a file that + is still open can cause a fair bit of extra work. For instance, + the NFS client will do a "sillyrename" to ensure that the file + sticks around while it's still open. When reexporting, that open + file is held by nfsd so we usually end up doing a sillyrename, and + then immediately deleting the sillyrenamed file just afterward when + the link count actually goes to zero. Sometimes this delete can race + with other operations (for instance an rmdir of the parent directory). + This flag causes nfsd to close any open files for this inode _before_ + calling into the vfs to do an unlink or a rename that would replace + an existing file. diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst new file mode 100644 index 000000000000..8536134f31fd --- /dev/null +++ b/Documentation/filesystems/nfs/index.rst @@ -0,0 +1,16 @@ +=============================== +NFS +=============================== + + +.. toctree:: + :maxdepth: 1 + + client-identifier + exporting + pnfs + rpc-cache + rpc-server-gss + nfs41-server + knfsd-stats + reexport diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.rst index 1a5d82180b84..80bcf13550de 100644 --- a/Documentation/filesystems/nfs/knfsd-stats.txt +++ b/Documentation/filesystems/nfs/knfsd-stats.rst @@ -1,7 +1,9 @@ - +============================ Kernel NFS Server Statistics ============================ +:Authors: Greg Banks <gnb@sgi.com> - 26 Mar 2009 + This document describes the format and semantics of the statistics which the kernel NFS server makes available to userspace. These statistics are available in several text form pseudo files, each of @@ -18,7 +20,7 @@ by parsing routines. All other lines contain a sequence of fields separated by whitespace. /proc/fs/nfsd/pool_stats ------------------------- +======================== This file is available in kernels from 2.6.30 onwards, if the /proc/fs/nfsd filesystem is mounted (it almost always should be). @@ -109,15 +111,12 @@ this case), or the transport can be enqueued for later attention (sockets-enqueued counts this case), or the packet can be temporarily deferred because the transport is currently being used by an nfsd thread. This last case is not very interesting and is not explicitly -counted, but can be inferred from the other counters thus: +counted, but can be inferred from the other counters thus:: -packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) + packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) More ----- -Descriptions of the other statistics file should go here. - +==== -Greg Banks <gnb@sgi.com> -26 Mar 2009 +Descriptions of the other statistics file should go here. diff --git a/Documentation/filesystems/nfs/nfs41-server.rst b/Documentation/filesystems/nfs/nfs41-server.rst new file mode 100644 index 000000000000..16b5f02f81c3 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs41-server.rst @@ -0,0 +1,256 @@ +============================= +NFSv4.1 Server Implementation +============================= + +Server support for minorversion 1 can be controlled using the +/proc/fs/nfsd/versions control file. The string output returned +by reading this file will contain either "+4.1" or "-4.1" +correspondingly. + +Currently, server support for minorversion 1 is enabled by default. +It can be disabled at run time by writing the string "-4.1" to +the /proc/fs/nfsd/versions control file. Note that to write this +control file, the nfsd service must be taken down. You can use rpc.nfsd +for this; see rpc.nfsd(8). + +(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and +"-4", respectively. Therefore, code meant to work on both new and old +kernels must turn 4.1 on or off *before* turning support for version 4 +on or off; rpc.nfsd does this correctly.) + +The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based +on RFC 5661. + +From the many new features in NFSv4.1 the current implementation +focuses on the mandatory-to-implement NFSv4.1 Sessions, providing +"exactly once" semantics and better control and throttling of the +resources allocated for each client. + +The table below, taken from the NFSv4.1 document, lists +the operations that are mandatory to implement (REQ), optional +(OPT), and NFSv4.0 operations that are required not to implement (MNI) +in minor version 1. The first column indicates the operations that +are not supported yet by the linux server implementation. + +The OPTIONAL features identified and their abbreviations are as follows: + +- **pNFS** Parallel NFS +- **FDELG** File Delegations +- **DDELG** Directory Delegations + +The following abbreviations indicate the linux server implementation status. + +- **I** Implemented NFSv4.1 operations. +- **NS** Not Supported. +- **NS\*** Unimplemented optional feature. + +Operations +========== + ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| Implementation status | Operation | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition | ++=======================+======================+=====================+===========================+================+ +| | ACCESS | REQ | | Section 18.1 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | BACKCHANNEL_CTL | REQ | | Section 18.33 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | CLOSE | REQ | | Section 18.2 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | COMMIT | REQ | | Section 18.3 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | CREATE | REQ | | Section 18.4 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | CREATE_SESSION | REQ | | Section 18.36 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS* | DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | DELEGRETURN | OPT | FDELG, | Section 18.6 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | | | DDELG, pNFS | | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | | | (REQ) | | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | DESTROY_CLIENTID | REQ | | Section 18.50 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | DESTROY_SESSION | REQ | | Section 18.37 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | EXCHANGE_ID | REQ | | Section 18.35 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | FREE_STATEID | REQ | | Section 18.38 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | GETATTR | REQ | | Section 18.7 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS* | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | GETFH | REQ | | Section 18.8 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS* | GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LINK | OPT | | Section 18.9 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LOCK | REQ | | Section 18.10 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LOCKT | REQ | | Section 18.11 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LOCKU | REQ | | Section 18.12 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LOOKUP | REQ | | Section 18.13 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | LOOKUPP | REQ | | Section 18.14 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | NVERIFY | REQ | | Section 18.15 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | OPEN | REQ | | Section 18.16 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS* | OPENATTR | OPT | | Section 18.17 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | OPEN_CONFIRM | MNI | | N/A | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | OPEN_DOWNGRADE | REQ | | Section 18.18 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | PUTFH | REQ | | Section 18.19 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | PUTPUBFH | REQ | | Section 18.20 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | PUTROOTFH | REQ | | Section 18.21 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | READ | REQ | | Section 18.22 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | READDIR | REQ | | Section 18.23 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | READLINK | OPT | | Section 18.24 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | RECLAIM_COMPLETE | REQ | | Section 18.51 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | RELEASE_LOCKOWNER | MNI | | N/A | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | REMOVE | REQ | | Section 18.25 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | RENAME | REQ | | Section 18.26 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | RENEW | MNI | | N/A | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | RESTOREFH | REQ | | Section 18.27 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | SAVEFH | REQ | | Section 18.28 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | SECINFO | REQ | | Section 18.29 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | | | layout (REQ) | Section 13.12 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | SEQUENCE | REQ | | Section 18.46 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | SETATTR | REQ | | Section 18.30 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | SETCLIENTID | MNI | | N/A | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | SETCLIENTID_CONFIRM | MNI | | N/A | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS | SET_SSV | REQ | | Section 18.47 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I | TEST_STATEID | REQ | | Section 18.48 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | VERIFY | REQ | | Section 18.31 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS* | WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| | WRITE | REQ | | Section 18.32 | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ + + +Callback Operations +=================== ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| Implementation status | Operation | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition | ++=======================+=========================+=====================+===========================+===============+ +| | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_NOTIFY_LOCK | OPT | | Section 20.11 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | CB_RECALL | OPT | FDELG, | Section 20.2 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | DDELG, pNFS | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | (REQ) | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | DDELG, pNFS | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | (REQ) | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS | CB_RECALL_SLOT | REQ | | Section 20.8 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | (REQ) | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | DDELG, pNFS | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | (REQ) | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS* | CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | DDELG, pNFS | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| | | | (REQ) | | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ + + +Implementation notes: +===================== + +SSV: + The spec claims this is mandatory, but we don't actually know of any + implementations, so we're ignoring it for now. The server returns + NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof. + +GSS on the backchannel: + Again, theoretically required but not widely implemented (in + particular, the current Linux client doesn't request it). We return + NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION. + +DELEGPURGE: + mandatory only for servers that support CLAIM_DELEGATE_PREV and/or + CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that + persist across client reboots). Thus we need not implement this for + now. + +EXCHANGE_ID: + implementation ids are ignored + +CREATE_SESSION: + backchannel attributes are ignored + +SEQUENCE: + no support for dynamic slot table renegotiation (optional) + +Nonstandard compound limitations: + No support for a sessions fore channel RPC compound that requires both a + ca_maxrequestsize request and a ca_maxresponsesize reply, so we may + fail to live up to the promise we made in CREATE_SESSION fore channel + negotiation. + +See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt deleted file mode 100644 index 682a59fabe3f..000000000000 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ /dev/null @@ -1,173 +0,0 @@ -NFSv4.1 Server Implementation - -Server support for minorversion 1 can be controlled using the -/proc/fs/nfsd/versions control file. The string output returned -by reading this file will contain either "+4.1" or "-4.1" -correspondingly. - -Currently, server support for minorversion 1 is enabled by default. -It can be disabled at run time by writing the string "-4.1" to -the /proc/fs/nfsd/versions control file. Note that to write this -control file, the nfsd service must be taken down. You can use rpc.nfsd -for this; see rpc.nfsd(8). - -(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and -"-4", respectively. Therefore, code meant to work on both new and old -kernels must turn 4.1 on or off *before* turning support for version 4 -on or off; rpc.nfsd does this correctly.) - -The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based -on RFC 5661. - -From the many new features in NFSv4.1 the current implementation -focuses on the mandatory-to-implement NFSv4.1 Sessions, providing -"exactly once" semantics and better control and throttling of the -resources allocated for each client. - -The table below, taken from the NFSv4.1 document, lists -the operations that are mandatory to implement (REQ), optional -(OPT), and NFSv4.0 operations that are required not to implement (MNI) -in minor version 1. The first column indicates the operations that -are not supported yet by the linux server implementation. - -The OPTIONAL features identified and their abbreviations are as follows: - pNFS Parallel NFS - FDELG File Delegations - DDELG Directory Delegations - -The following abbreviations indicate the linux server implementation status. - I Implemented NFSv4.1 operations. - NS Not Supported. - NS* Unimplemented optional feature. - -Operations - - +----------------------+------------+--------------+----------------+ - | Operation | REQ, REC, | Feature | Definition | - | | OPT, or | (REQ, REC, | | - | | MNI | or OPT) | | - +----------------------+------------+--------------+----------------+ - | ACCESS | REQ | | Section 18.1 | -I | BACKCHANNEL_CTL | REQ | | Section 18.33 | -I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | - | CLOSE | REQ | | Section 18.2 | - | COMMIT | REQ | | Section 18.3 | - | CREATE | REQ | | Section 18.4 | -I | CREATE_SESSION | REQ | | Section 18.36 | -NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | - | DELEGRETURN | OPT | FDELG, | Section 18.6 | - | | | DDELG, pNFS | | - | | | (REQ) | | -I | DESTROY_CLIENTID | REQ | | Section 18.50 | -I | DESTROY_SESSION | REQ | | Section 18.37 | -I | EXCHANGE_ID | REQ | | Section 18.35 | -I | FREE_STATEID | REQ | | Section 18.38 | - | GETATTR | REQ | | Section 18.7 | -I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | -NS*| GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | - | GETFH | REQ | | Section 18.8 | -NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | -I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | -I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | -I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | - | LINK | OPT | | Section 18.9 | - | LOCK | REQ | | Section 18.10 | - | LOCKT | REQ | | Section 18.11 | - | LOCKU | REQ | | Section 18.12 | - | LOOKUP | REQ | | Section 18.13 | - | LOOKUPP | REQ | | Section 18.14 | - | NVERIFY | REQ | | Section 18.15 | - | OPEN | REQ | | Section 18.16 | -NS*| OPENATTR | OPT | | Section 18.17 | - | OPEN_CONFIRM | MNI | | N/A | - | OPEN_DOWNGRADE | REQ | | Section 18.18 | - | PUTFH | REQ | | Section 18.19 | - | PUTPUBFH | REQ | | Section 18.20 | - | PUTROOTFH | REQ | | Section 18.21 | - | READ | REQ | | Section 18.22 | - | READDIR | REQ | | Section 18.23 | - | READLINK | OPT | | Section 18.24 | - | RECLAIM_COMPLETE | REQ | | Section 18.51 | - | RELEASE_LOCKOWNER | MNI | | N/A | - | REMOVE | REQ | | Section 18.25 | - | RENAME | REQ | | Section 18.26 | - | RENEW | MNI | | N/A | - | RESTOREFH | REQ | | Section 18.27 | - | SAVEFH | REQ | | Section 18.28 | - | SECINFO | REQ | | Section 18.29 | -I | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | - | | | layout (REQ) | Section 13.12 | -I | SEQUENCE | REQ | | Section 18.46 | - | SETATTR | REQ | | Section 18.30 | - | SETCLIENTID | MNI | | N/A | - | SETCLIENTID_CONFIRM | MNI | | N/A | -NS | SET_SSV | REQ | | Section 18.47 | -I | TEST_STATEID | REQ | | Section 18.48 | - | VERIFY | REQ | | Section 18.31 | -NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | - | WRITE | REQ | | Section 18.32 | - -Callback Operations - - +-------------------------+-----------+-------------+---------------+ - | Operation | REQ, REC, | Feature | Definition | - | | OPT, or | (REQ, REC, | | - | | MNI | or OPT) | | - +-------------------------+-----------+-------------+---------------+ - | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | -I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | -NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | -NS*| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | -NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | -NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | - | CB_RECALL | OPT | FDELG, | Section 20.2 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS | CB_RECALL_SLOT | REQ | | Section 20.8 | -NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | - | | | (REQ) | | -I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | - | | | DDELG, pNFS | | - | | | (REQ) | | - +-------------------------+-----------+-------------+---------------+ - -Implementation notes: - -SSV: -* The spec claims this is mandatory, but we don't actually know of any - implementations, so we're ignoring it for now. The server returns - NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof. - -GSS on the backchannel: -* Again, theoretically required but not widely implemented (in - particular, the current Linux client doesn't request it). We return - NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION. - -DELEGPURGE: -* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or - CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that - persist across client reboots). Thus we need not implement this for - now. - -EXCHANGE_ID: -* implementation ids are ignored - -CREATE_SESSION: -* backchannel attributes are ignored - -SEQUENCE: -* no support for dynamic slot table renegotiation (optional) - -Nonstandard compound limitations: -* No support for a sessions fore channel RPC compound that requires both a - ca_maxrequestsize request and a ca_maxresponsesize reply, so we may - fail to live up to the promise we made in CREATE_SESSION fore channel - negotiation. - -See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.rst index 80dc0bdc302a..7c470ecdc3a9 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.rst @@ -1,15 +1,17 @@ -Reference counting in pnfs: +========================== +Reference counting in pnfs ========================== The are several inter-related caches. We have layouts which can reference multiple devices, each of which can reference multiple data servers. Each data server can be referenced by multiple devices. Each device -can be referenced by multiple layouts. To keep all of this straight, +can be referenced by multiple layouts. To keep all of this straight, we need to reference count. struct pnfs_layout_hdr ----------------------- +====================== + The on-the-wire command LAYOUTGET corresponds to struct pnfs_layout_segment, usually referred to by the variable name lseg. Each nfs_inode may hold a pointer to a cache of these layout @@ -25,7 +27,8 @@ the reference count, as the layout is kept around by the lseg that keeps it in the list. deviceid_cache --------------- +============== + lsegs reference device ids, which are resolved per nfs_client and layout driver type. The device ids are held in a RCU cache (struct nfs4_deviceid_cache). The cache itself is referenced across each @@ -38,24 +41,26 @@ justification, but seems reasonable given that we can have multiple deviceid's per filesystem, and multiple filesystems per nfs_client. The hash code is copied from the nfsd code base. A discussion of -hashing and variations of this algorithm can be found at: -http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 +hashing and variations of this algorithm can be found `here. +<http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809>`_ data server cache ------------------ +================= + file driver devices refer to data servers, which are kept in a module level cache. Its reference is held over the lifetime of the deviceid pointing to it. lseg ----- +==== + lseg maintains an extra reference corresponding to the NFS_LSEG_VALID bit which holds it in the pnfs_layout_hdr's list. When the final lseg is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED bit is set, preventing any new lsegs from being added. layout drivers --------------- +============== PNFS utilizes what is called layout drivers. The STD defines 4 basic layout types: "files", "objects", "blocks", and "flexfiles". For each @@ -68,6 +73,6 @@ Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory blocks-layout setup -------------------- +=================== TODO: Document the setup needs of the blocks layout driver diff --git a/Documentation/filesystems/nfs/reexport.rst b/Documentation/filesystems/nfs/reexport.rst new file mode 100644 index 000000000000..ff9ae4a46530 --- /dev/null +++ b/Documentation/filesystems/nfs/reexport.rst @@ -0,0 +1,113 @@ +Reexporting NFS filesystems +=========================== + +Overview +-------- + +It is possible to reexport an NFS filesystem over NFS. However, this +feature comes with a number of limitations. Before trying it, we +recommend some careful research to determine whether it will work for +your purposes. + +A discussion of current known limitations follows. + +"fsid=" required, crossmnt broken +--------------------------------- + +We require the "fsid=" export option on any reexport of an NFS +filesystem. You can use "uuidgen -r" to generate a unique argument. + +The "crossmnt" export does not propagate "fsid=", so it will not allow +traversing into further nfs filesystems; if you wish to export nfs +filesystems mounted under the exported filesystem, you'll need to export +them explicitly, assigning each its own unique "fsid= option. + +Reboot recovery +--------------- + +The NFS protocol's normal reboot recovery mechanisms don't work for the +case when the reexport server reboots. Clients will lose any locks +they held before the reboot, and further IO will result in errors. +Closing and reopening files should clear the errors. + +Filehandle limits +----------------- + +If the original server uses an X byte filehandle for a given object, the +reexport server's filehandle for the reexported object will be X+22 +bytes, rounded up to the nearest multiple of four bytes. + +The result must fit into the RFC-mandated filehandle size limits: + ++-------+-----------+ +| NFSv2 | 32 bytes | ++-------+-----------+ +| NFSv3 | 64 bytes | ++-------+-----------+ +| NFSv4 | 128 bytes | ++-------+-----------+ + +So, for example, you will only be able to reexport a filesystem over +NFSv2 if the original server gives you filehandles that fit in 10 +bytes--which is unlikely. + +In general there's no way to know the maximum filehandle size given out +by an NFS server without asking the server vendor. + +But the following table gives a few examples. The first column is the +typical length of the filehandle from a Linux server exporting the given +filesystem, the second is the length after that nfs export is reexported +by another Linux host: + ++--------+-------------------+----------------+ +| | filehandle length | after reexport | ++========+===================+================+ +| ext4: | 28 bytes | 52 bytes | ++--------+-------------------+----------------+ +| xfs: | 32 bytes | 56 bytes | ++--------+-------------------+----------------+ +| btrfs: | 40 bytes | 64 bytes | ++--------+-------------------+----------------+ + +All will therefore fit in an NFSv3 or NFSv4 filehandle after reexport, +but none are reexportable over NFSv2. + +Linux server filehandles are a bit more complicated than this, though; +for example: + + - The (non-default) "subtreecheck" export option generally + requires another 4 to 8 bytes in the filehandle. + - If you export a subdirectory of a filesystem (instead of + exporting the filesystem root), that also usually adds 4 to 8 + bytes. + - If you export over NFSv2, knfsd usually uses a shorter + filesystem identifier that saves 8 bytes. + - The root directory of an export uses a filehandle that is + shorter. + +As you can see, the 128-byte NFSv4 filehandle is large enough that +you're unlikely to have trouble using NFSv4 to reexport any filesystem +exported from a Linux server. In general, if the original server is +something that also supports NFSv3, you're *probably* OK. Re-exporting +over NFSv3 may be dicier, and reexporting over NFSv2 will probably +never work. + +For more details of Linux filehandle structure, the best reference is +the source code and comments; see in particular: + + - include/linux/exportfs.h:enum fid_type + - include/uapi/linux/nfsd/nfsfh.h:struct nfs_fhbase_new + - fs/nfsd/nfsfh.c:set_version_and_fsid_type + - fs/nfs/export.c:nfs_encode_fh + +Open DENY bits ignored +---------------------- + +NFS since NFSv4 supports ALLOW and DENY bits taken from Windows, which +allow you, for example, to open a file in a mode which forbids other +read opens or write opens. The Linux client doesn't use them, and the +server's support has always been incomplete: they are enforced only +against other NFS users, not against processes accessing the exported +filesystem locally. A reexport server will also not pass them along to +the original server, so they will not be enforced between clients of +different reexport servers. diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.rst index c4dac829db0f..bb164eea969b 100644 --- a/Documentation/filesystems/nfs/rpc-cache.txt +++ b/Documentation/filesystems/nfs/rpc-cache.rst @@ -1,9 +1,14 @@ - This document gives a brief introduction to the caching +========= +RPC Cache +========= + +This document gives a brief introduction to the caching mechanisms in the sunrpc layer that is used, in particular, for NFS authentication. -CACHES +Caches ====== + The caching replaces the old exports table and allows for a wide variety of values to be caches. @@ -12,6 +17,7 @@ quite possibly very different in content and use. There is a corpus of common code for managing these caches. Examples of caches that are likely to be needed are: + - mapping from IP address to client name - mapping from client name and filesystem to export options - mapping from UID to list of GIDs, to work around NFS's limitation @@ -21,6 +27,7 @@ Examples of caches that are likely to be needed are: - mapping from network identify to public key for crypto authentication. The common code handles such things as: + - general cache lookup with correct locking - supporting 'NEGATIVE' as well as positive entries - allowing an EXPIRED time on cache items, and removing @@ -35,60 +42,66 @@ The common code handles such things as: Creating a Cache ---------------- -1/ A cache needs a datum to store. This is in the form of a - structure definition that must contain a - struct cache_head +- A cache needs a datum to store. This is in the form of a + structure definition that must contain a struct cache_head as an element, usually the first. It will also contain a key and some content. Each cache element is reference counted and contains expiry and update times for use in cache management. -2/ A cache needs a "cache_detail" structure that +- A cache needs a "cache_detail" structure that describes the cache. This stores the hash table, some parameters for cache management, and some operations detailing how to work with particular cache items. - The operations requires are: - struct cache_head *alloc(void) - This simply allocates appropriate memory and returns - a pointer to the cache_detail embedded within the - structure - void cache_put(struct kref *) - This is called when the last reference to an item is - dropped. The pointer passed is to the 'ref' field - in the cache_head. cache_put should release any - references create by 'cache_init' and, if CACHE_VALID - is set, any references created by cache_update. - It should then release the memory allocated by - 'alloc'. - int match(struct cache_head *orig, struct cache_head *new) - test if the keys in the two structures match. Return - 1 if they do, 0 if they don't. - void init(struct cache_head *orig, struct cache_head *new) - Set the 'key' fields in 'new' from 'orig'. This may - include taking references to shared objects. - void update(struct cache_head *orig, struct cache_head *new) - Set the 'content' fileds in 'new' from 'orig'. - int cache_show(struct seq_file *m, struct cache_detail *cd, - struct cache_head *h) - Optional. Used to provide a /proc file that lists the - contents of a cache. This should show one item, - usually on just one line. - int cache_request(struct cache_detail *cd, struct cache_head *h, - char **bpp, int *blen) - Format a request to be send to user-space for an item - to be instantiated. *bpp is a buffer of size *blen. - bpp should be moved forward over the encoded message, - and *blen should be reduced to show how much free - space remains. Return 0 on success or <0 if not - enough room or other problem. - int cache_parse(struct cache_detail *cd, char *buf, int len) - A message from user space has arrived to fill out a - cache entry. It is in 'buf' of length 'len'. - cache_parse should parse this, find the item in the - cache with sunrpc_cache_lookup_rcu, and update the item - with sunrpc_cache_update. - - -3/ A cache needs to be registered using cache_register(). This + + The operations are: + + struct cache_head \*alloc(void) + This simply allocates appropriate memory and returns + a pointer to the cache_detail embedded within the + structure + + void cache_put(struct kref \*) + This is called when the last reference to an item is + dropped. The pointer passed is to the 'ref' field + in the cache_head. cache_put should release any + references create by 'cache_init' and, if CACHE_VALID + is set, any references created by cache_update. + It should then release the memory allocated by + 'alloc'. + + int match(struct cache_head \*orig, struct cache_head \*new) + test if the keys in the two structures match. Return + 1 if they do, 0 if they don't. + + void init(struct cache_head \*orig, struct cache_head \*new) + Set the 'key' fields in 'new' from 'orig'. This may + include taking references to shared objects. + + void update(struct cache_head \*orig, struct cache_head \*new) + Set the 'content' fileds in 'new' from 'orig'. + + int cache_show(struct seq_file \*m, struct cache_detail \*cd, struct cache_head \*h) + Optional. Used to provide a /proc file that lists the + contents of a cache. This should show one item, + usually on just one line. + + int cache_request(struct cache_detail \*cd, struct cache_head \*h, char \*\*bpp, int \*blen) + Format a request to be send to user-space for an item + to be instantiated. \*bpp is a buffer of size \*blen. + bpp should be moved forward over the encoded message, + and \*blen should be reduced to show how much free + space remains. Return 0 on success or <0 if not + enough room or other problem. + + int cache_parse(struct cache_detail \*cd, char \*buf, int len) + A message from user space has arrived to fill out a + cache entry. It is in 'buf' of length 'len'. + cache_parse should parse this, find the item in the + cache with sunrpc_cache_lookup_rcu, and update the item + with sunrpc_cache_update. + + +- A cache needs to be registered using cache_register(). This includes it on a list of caches that will be regularly cleaned to discard old data. @@ -107,7 +120,7 @@ cache_check will return -ENOENT in the entry is negative or if an up call is needed but not possible, -EAGAIN if an upcall is pending, or 0 if the data is valid; -cache_check can be passed a "struct cache_req *". This structure is +cache_check can be passed a "struct cache_req\*". This structure is typically embedded in the actual request and can be used to create a deferred copy of the request (struct cache_deferred_req). This is done when the found cache item is not uptodate, but the is reason to @@ -139,9 +152,11 @@ The 'channel' works a bit like a datagram socket. Each 'write' is passed as a whole to the cache for parsing and interpretation. Each cache can treat the write requests differently, but it is expected that a message written will contain: + - a key - an expiry time - a content. + with the intention that an item in the cache with the give key should be create or updated to have the given content, and the expiry time should be set on that item. @@ -156,7 +171,8 @@ If there are no more requests to return, read will return EOF, but a select or poll for read will block waiting for another request to be added. -Thus a user-space helper is likely to: +Thus a user-space helper is likely to:: + open the channel. select for readable read a request @@ -175,12 +191,13 @@ Each cache should also define a "cache_request" method which takes a cache item and encodes a request into the buffer provided. -Note: If a cache has no active readers on the channel, and has had not -active readers for more than 60 seconds, further requests will not be -added to the channel but instead all lookups that do not find a valid -entry will fail. This is partly for backward compatibility: The -previous nfs exports table was deemed to be authoritative and a -failed lookup meant a definite 'no'. +.. note:: + If a cache has no active readers on the channel, and has had not + active readers for more than 60 seconds, further requests will not be + added to the channel but instead all lookups that do not find a valid + entry will fail. This is partly for backward compatibility: The + previous nfs exports table was deemed to be authoritative and a + failed lookup meant a definite 'no'. request/response format ----------------------- @@ -193,10 +210,11 @@ with precisely one newline character which should be at the end. Fields within the record should be separated by spaces, normally one. If spaces, newlines, or nul characters are needed in a field they much be quoted. two mechanisms are available: -1/ If a field begins '\x' then it must contain an even number of + +- If a field begins '\x' then it must contain an even number of hex digits, and pairs of these digits provide the bytes in the field. -2/ otherwise a \ in the field must be followed by 3 octal digits +- otherwise a \ in the field must be followed by 3 octal digits which give the code for a byte. Other characters are treated as them selves. At the very least, space, newline, nul, and '\' must be quoted in this way. diff --git a/Documentation/filesystems/nfs/rpc-server-gss.txt b/Documentation/filesystems/nfs/rpc-server-gss.rst index 310bbbaf9080..ccaea9e7cea2 100644 --- a/Documentation/filesystems/nfs/rpc-server-gss.txt +++ b/Documentation/filesystems/nfs/rpc-server-gss.rst @@ -1,4 +1,4 @@ - +========================================= rpcsec_gss support for kernel RPC servers ========================================= @@ -9,14 +9,16 @@ NFSv4.1 and higher don't require the client to act as a server for the purposes of authentication.) RPCGSS is specified in a few IETF documents: - - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt - - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt -and there is a 3rd version being proposed: - - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt - (At draft n. 02 at the time of writing) + + - RFC2203 v1: https://tools.ietf.org/rfc/rfc2203.txt + - RFC5403 v2: https://tools.ietf.org/rfc/rfc5403.txt + +There is a third version that we don't currently implement: + + - RFC7861 v3: https://tools.ietf.org/rfc/rfc7861.txt Background ----------- +========== The RPCGSS Authentication method describes a way to perform GSSAPI Authentication for NFS. Although GSSAPI is itself completely mechanism @@ -29,6 +31,7 @@ depends on GSSAPI extensions that are KRB5 specific. GSSAPI is a complex library, and implementing it completely in kernel is unwarranted. However GSSAPI operations are fundementally separable in 2 parts: + - initial context establishment - integrity/privacy protection (signing and encrypting of individual packets) @@ -41,7 +44,7 @@ kernel, but leave the initial context establishment to userspace. We need upcalls to request userspace to perform context establishment. NFS Server Legacy Upcall Mechanism ----------------------------------- +================================== The classic upcall mechanism uses a custom text based upcall mechanism to talk to a custom daemon called rpc.svcgssd that is provide by the @@ -62,21 +65,20 @@ groups) due to limitation on the size of the buffer that can be send back to the kernel (4KiB). NFS Server New RPC Upcall Mechanism ------------------------------------ +=================================== The newer upcall mechanism uses RPC over a unix socket to a daemon called gss-proxy, implemented by a userspace program called Gssproxy. -The gss_proxy RPC protocol is currently documented here: - - https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation +The gss_proxy RPC protocol is currently documented `here +<https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation>`_. This upcall mechanism uses the kernel rpc client and connects to the gssproxy userspace program over a regular unix socket. The gssproxy protocol does not suffer from the size limitations of the legacy protocol. Negotiating Upcall Mechanisms ------------------------------ +============================= To provide backward compatibility, the kernel defaults to using the legacy mechanism. To switch to the new mechanism, gss-proxy must bind |