📄 2.t
字号:
.\" Copyright (c) 1993.\" The Regents of the University of California. All rights reserved..\".\" This document is derived from software contributed to Berkeley by.\" Rick Macklem at The University of Guelph..\".\" Redistribution and use in source and binary forms, with or without.\" modification, are permitted provided that the following conditions.\" are met:.\" 1. Redistributions of source code must retain the above copyright.\" notice, this list of conditions and the following disclaimer..\" 2. Redistributions in binary form must reproduce the above copyright.\" notice, this list of conditions and the following disclaimer in the.\" documentation and/or other materials provided with the distribution..\" 3. All advertising materials mentioning features or use of this software.\" must display the following acknowledgement:.\" This product includes software developed by the University of.\" California, Berkeley and its contributors..\" 4. Neither the name of the University nor the names of its contributors.\" may be used to endorse or promote products derived from this software.\" without specific prior written permission..\".\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION).\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF.\" SUCH DAMAGE..\".\" @(#)2.t 8.1 (Berkeley) 6/8/93.\".sh 1 "Not Quite NFS, Crash Tolerant Cache Consistency for NFS".ppNot Quite NFS (NQNFS) is an NFS like protocol designed to maintain full cacheconsistency between clients in a crash tolerant manner.It is an adaptation of the NFS protocol such that the server supports both NFSand NQNFS clients while maintaining full consistency between the server andNQNFS clients.This section borrows heavily from work done on Spritely-NFS [Srinivasan89],but uses Leases [Gray89] to avoid the need to recover server state informationafter a crash.The reader is strongly encouraged to read these references beforetrying to grasp the material presented here..sh 2 "Overview".ppThe protocol maintains cache consistency by using a somewhatSprite [Nelson88] like protocol,but is based on short term leases\** instead of hard state informationabout open files..(f\** A lease is a ticket permitting an activity that isvalid until some expiry time..)fThe basic principal is that the protocol will disable client caching of afile whenever that file is write shared\**..(f\** Write sharing occurs when at least one client is modifying a file whileother client(s) are reading the file..)fWhenever a client wishes to cache data for a file it must hold a valid lease.There are three types of leases: read caching, write caching and non-caching.The latter type requires that all file operations be done synchronously withthe server via. RPCs.A read caching lease allows for client data caching, but no file modificationsmay be done.A write caching lease allows for client caching of writes,but requires that all writes be pushed to the server when the lease expires.If a client has dirty buffers\**.(f\** Cached write data is not yet pushed (written) to the server..)fwhen a write cache lease has almost expired, it will attempt toextend the lease but is required to push the dirty buffers if extension fails.A client gets leases by either doing a \fBGetLease RPC\fR or by piggybackinga \fBGetLease Request\fR onto another RPC. Piggybacking is supported for thefrequent RPCs Getattr, Setattr, Lookup, Readlink, Read, Write and Readdirin an effort to minimize the number of \fBGetLease RPCs\fR required.All leases are at the granularity of a file, since all NFS RPCs operate onindividual files and NFS has no intrinsic notion of a file hierarchy.Directories, symbolic links and file attributes may be read cached butare not write cached.The exception here is the attribute file_size, which is updated during cachedwriting on the client to reflect a growing file..ppIt is the server's responsibility to ensure that consistency is maintainedamong the NQNFS clients by disabling client caching whenever a server fileoperation would cause inconsistencies.The possibility of inconsistencies occurs whenever a client hasa write caching lease and any other client,or local operations on the server,tries to access the file or whena modify operation is attempted on a file being read cached by client(s).At this time, the server sends an \fBeviction notice\fR to all clients holdingthe lease and then waits for lease termination.Lease termination occurs when a \fBvacated the premises\fR message has beenreceived from all the clients that have signed the lease or when the leaseexpires via. timeout.The message pair \fBeviction notice\fR and \fBvacated the premises\fR roughlycorrespond to a Sprite server\(->client callback, but are not implemented as anactual RPC, to avoid the server waiting indefinitely for a reply from a deadclient..ppServer consistency checking can be viewed as issuing intrinsic leases for afile operation for the duration of the operation only. For example, the\fBCreate RPC\fR will get an intrinsic write lease on the directory in whichthe file is being created, disabling client read caches for that directory..ppBy relegating this responsibility to the server, consistency between theserver and NQNFS clients is maintained when NFS clients are modifying thefile system as well.\**.(f\** The NFS clients will continue to be \fIapproximately\fR consistent withthe server..)f.ppThe leases are issued as time intervals to avoid the requirement of time of dayclock synchronization. There are three important time constants known tothe server. The \fBmaximum_lease_term\fR sets an upper bound on lease duration.The \fBclock_skew\fR is added to all lease terms on the server to correct fordiffering clock speeds between the client and server and \fBwrite_slack\fR isthe number of seconds the server is willing to wait for a client withan expired write caching lease to push dirty writes..ppThe server maintains a \fBmodify_revision\fR number for each file. It isdefined as a unsigned quadword integer that is never zero and that mustincrease whenever the corresponding file is modified on the server.It is usedby the client to determine whether or not cached data for the file isstale.Generating this value is easier said than done. The current implementationuses the following technique, which is believed to be adequate.The high order longword is stored in the ufs inode and is initialized to onewhen an inode is first allocated.The low order longword is stored in main memory only and is initialized tozero when an inode is read in from disk.When the file is modified for the first time within a given second ofwall clock time, the high order longword is incremented by one andthe low order longword reset to zero.For subsequent modifications within the same second of wall clocktime, the low order longword is incremented. If the low order longword wrapsaround to zero, the high order longword is incremented again.Since the high order longword only increments once per second and the inodeis pushed to disk frequently during file modification, this implies0 \(<= Current\(miDisk \(<= 5.When the inode is read in from disk, 10is added to the high order longword, which ensures that the quadwordis greater than any value it could have had before a crash.This introduces apparent modifications every time the inode falls out ofthe LRU inode cache, but this should only reduce the client caching performanceby a (hopefully) small margin..sh 2 "Crash Recovery and other Failure Scenarios".ppThe server must maintain the state of all the current leases held by clients.The nice thing about short term leases is that maximum_lease_term secondsafter the server stops issuing leases, there are no current leases left.As such, server crash recovery does not require any state recovery. Afterrebooting, the server refuses to service any RPCs except for writes untilwrite_slack seconds after the last lease would have expired\**..(f\** The last lease expiry time may be safely estimated as"boottime+maximum_lease_term+clock_skew" for machines that cannot storeit in nonvolatile RAM..)fBy then, the server would not have any outstanding leases to recover thestate of and the clients have had at least write_slack seconds to push dirtywrites to the server and get the server sync'd up to date. After this, theserver simply services requests in a manner similar to NFS.In an effort to minimize the effect of "recovery storms" [Baker91],the server replies \fBtry_again_later\fR to the RPCs it is notyet ready to service..ppAfter a client crashes, the server may have to wait for a lease to timeoutbefore servicing a request if write sharing of a file with a cachable leaseon the client is about to occur.As for the client, it simply starts up getting any leases it now needs. Anyoutstanding leases for that client on the server prior to the crash will either be renewed or expirevia timeout..ppCertain network partitioning failures are more problematic. If a client toserver network connection is severed just before a write caching lease expires,the client cannot push the dirty writes to the server. After the lease expireson the server, the server permits other clients to access the file with thepotential of getting stale data. Unfortunately I believe this failure scenariois intrinsic in any delay write caching scheme unless the server is required towait \fBforever\fR for a client to regain contact\**..(f\** Gray and Cheriton avoid this problem by using a \fBwrite through\fR policy..)fSince the write caching lease has expired on the client,it will sync up with theserver as soon as the network connection has been re-established..ppThere is another failure condition that can occur when the server is congested.The worst case scenario would have the client pushing dirty writes to the serverbut a large request queue on the server delays these writes for more than\fBwrite_slack\fR seconds. It is hoped that a congestion control scheme usingthe \fBtry_again_later\fR RPC reply after booting combined withthe following lease termination rule for write caching leasescan minimize the risk of this occurrence.A write caching lease is only terminated on the server when there are havebeen no writes to the file and the server has not been overloaded duringthe previous write_slack seconds. The server has not been overloadedis approximated by a test for sleeping nfsd(s) at the end of the write_slackperiod..sh 2 "Server Disk Full".ppThere is a serious unresolved problem for delayed write caching with respect toserver disk space allocation.When the disk on the file server is full, delayed write RPCs can faildue to "out of space".For NFS, this occurrence results in an error return from the close systemcall on the file, since the dirty blocks are pushed on close.Processes writing important files can check for this error returnto ensure that the file was written successfully.For NQNFS, the dirty blocks are not pushed on close and as such the clientmay not attempt the write RPC until after the process has done the closewhich implies no error return from the close.For the current prototype,the only solution is to modify programs writing importantfile(s) to call fsync and check for an error return from it instead of close..sh 2 "Protocol Details".ppThe protocol specification is identical to that of NFS [Sun89] except forthe following changes..ip \(buRPC Information.(l Program Number 300105 Version Number 1.)l.ip \(buReaddir_and_Lookup RPC.(l struct readdirlookargs { fhandle file; nfscookie cookie; unsigned count; unsigned duration; }; struct entry { unsigned cachable; unsigned duration; modifyrev rev; fhandle entry_fh; nqnfs_fattr entry_attrib; unsigned fileid; filename name; nfscookie cookie; entry *nextentry; }; union readdirlookres switch (stat status) { case NFS_OK: struct { entry *entries; bool eof; } readdirlookok; default: void; };
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -