📄 fsinterface.ms
字号:
pathname-to-internal-representation translation.The style of the name translation function is very different in allthree systems.As described above, the AT&T and DEC systems retain the \fInamei\fP function.The two are quite different, however, as the ULTRIX interface usesthe \fInamei\fP calling convention introduced in 4.3BSD.The parameters and context for the name lookup operationare collected in a \fInameidata\fP structure which is passed to \fInamei\fPfor operation.Intent to create or delete the named file is declared in advance,so that the final directory scan in \fInamei\fP may retain informationsuch as the offset in the directory at which the modification will be made.Filesystems that use such mechanisms to avoid redundant workmust therefore lock the directory to be modified so that it may notbe modified by another process before completion.In the System V filesystem, as in previous versions of.UX ,this information is stored in the per-process \fIuser\fP structureby \fInamei\fP for use by a low-level routine called after performingthe actual creation or deletion of the file itself.In 4.3BSD and in the GFS interface, these side effects of \fInamei\fPare stored in the \fInameidata\fP structure given as argument to \fInamei\fP,which is also presented to the routine implementing file creation or deletion..PPThe ULTRIX \fInamei\fP routine is responsible for the genericparts of the name translation process, such as copying the name intoan internal buffer, validating it, interpolatingthe contents of symbolic links, and indirecting at mount points.As in 4.3BSD, the name is copied into the buffer in a single call,according to the location of the name.After determining the type of the filesystem at the start of translation(the current directory or root directory), it calls the filesystem's\fInamei\fP entry with the same structure it received from its caller.The filesystem-specific routine translates the name, component by component,as long as no mount points are reached.It may return after any number of components have been processed.\fINamei\fP performs any processing at mount points, then callsthe correct translation routine for the next filesystem.Network filesystems may pass the remaining pathname to a server for translation,or they may look up the pathname components one at a time.The former strategy would be more efficient,but the latter scheme allows mount points within a remote filesystemwithout server knowledge of all client mounts..PPThe AT&T \fInamei\fP interface is presumably the same as that in previous.UXsystems, accepting the name of a routine to fetch pathname charactersand an operation (one of: lookup, lookup for creation, or lookup for deletion).It translates, component by component, as before.If it detects that a mount point crosses to a remote filesystem,it passes the remainder of the pathname to the remote server.A pathname-oriented request other than open may be completedwithin the \fInamei\fP call,avoiding return to the (unmodified) system call handlerthat called \fInamei\fP..PPIn contrast to the first two systems, Sun's VFS interface has replaced\fInamei\fP with \fIlookupname\fP.This routine simply calls a new pathname-handling module to allocatea pathname buffer and copy in the pathname (copying a character per call),then calls \fIlookuppn\fP.\fILookuppn\fP performs the iteration over the directories leadingto the destination file; it copies each pathname component to a local buffer,then calls the filesystem \fIlookup\fP entry to locate the vnodefor that file in the current directory.Per-filesystem \fIlookup\fP routines may translate only one componentper call.For creation and deletion of new files, the lookup operation is unmodified;the lookup of the final component only serves to check for the existenceof the file.The subsequent creation or deletion call, if any, must repeat the finalname translation and associated directory scan.For new file creation in particular, this is rather inefficient,as file creation requires two complete scans of the directory..PPSeveral of the important performance improvements in 4.3BSDwere related to the name translation process [McKusick85][Leffler84].The following changes were made:.IP 1. 4A system-wide cache of recent translations is maintained.The cache is separate from the inode cache, so that multiple namesfor a file may be present in the cache.The cache does not hold ``hard'' references to the inodes,so that the normal reference pattern is not disturbed..IP 2.A per-process cache is kept of the directory and offsetat which the last successful name lookup was done.This allows sequential lookups of all the entries in a directory to be donein linear time..IP 3.The entire pathname is copied into a kernel buffer in a single operation,rather than using two subroutine calls per character..IP 4.A pool of pathname buffers are held by \fInamei\fP, avoiding allocationoverhead..LPAll of these performance improvements from 4.3BSD are well worth usingwithin a more generalized filesystem framework.The generalization of the structure may otherwise make an already-expensivefunction even more costly.Most of these improvements are present in the GFS system, as it derivesfrom the beta-test version of 4.3BSD.The Sun system uses a name-translation cache generally like that in 4.3BSD.The name cache is a filesystem-independent facility provided for the useof the filesystem-specific lookup routines.The Sun cache, like that first used at Berkeley but unlike that in 4.3,holds a ``hard'' reference to the vnode (increments the reference count).The ``soft'' reference scheme in 4.3BSD cannot be used with the currentNFS implementation, as NFS allocates vnodes dynamically and frees themwhen the reference count returns to zero rather than caching them.As a result, fewer names may be held in the cachethan (local filesystem) vnodes, and the cache distorts the normal referencepatterns otherwise seen by the LRU cache.As the name cache references overflow the local filesystem inode table,the name cache must be purged to make room in the inode table.Also, to determine whether a vnode is in use (for example,before mounting upon it), the cache must be flushed to free anycache reference.These problems should be correctedby the use of the soft cache reference scheme..PPA final observation on the efficiency of name translation in the currentSun VFS architecture is that the number of subroutine calls usedby a multi-component name lookup is dramatically largerthan in the other systems.The name lookup scheme in GFS suffers from this problem much less,at no expense in violation of layering..PPA final problem to be considered is synchronization and consistency.As the filesystem operations are more stylized and broken into separateentry points for parts of operations, it is more difficult to guaranteeconsistency throughout an operation and/or to synchronize with otherprocesses using the same filesystem objects.The Sun interface suffers most severely from this,as it forbids the filesystems from locking objects across callsto the filesystem.It is possible that a file may be created between the time that a lookupis performed and a subsequent creation is requested.Perhaps more strangely, after a lookup fails to find the targetof a creation attempt, the actual creation might find that the targetnow exists and is a symbolic link.The call will either fail unexpectedly, as the target is of the wrong type,or the generic creation routine will have to note the errorand restart the operation from the lookup.This problem will always exist in a stateless filesystem,but the VFS interface forces all filesystems to share the problem.This restriction against locking between calls alsoforces duplication of work during file creation and deletion.This is considered unacceptable..SHSupport facilities and other interactions.PPSeveral support facilities are used by the current.UXfilesystem and require generalization for use by other filesystem types.For filesystem implementations to be portable,it is desirable that these modified support facilitiesshould also have a uniform interface and behave in a consistent manner in target systems.A prominent example is the filesystem buffer cache.The buffer cache in a standard (System V or 4.3BSD).UXsystem contains physical disk blocks with no reference to the files containingthem.This works well for the local filesystem, but has obvious problemsfor remote filesystems.Sun has modified the buffer cache routines to describe buffers by vnoderather than by device.For remote files, the vnode used is that of the file, and the blocknumbers are virtual data blocks.For local filesystems, a vnode for the block device is used for cache reference,and the block numbers are filesystem physical blocks.Use of per-file cache description does not easily accommodatecaching of indirect blocks, inode blocks, superblocks or cylinder group blocks.However, the vnode describing the block device for the cacheis one created internally,rather than the vnode for the device looked up when mounting,and it is located by searching a private list of vnodesrather than by holding it in the mount structure.Although the Sun modification makes it possible to use the buffercache for data blocks of remote files, a better generalizationof the buffer cache is needed..PPThe RFS filesystem used by AT&T does not currently cache data blockson client systems, thus the buffer cache is probably unmodified.The form of the buffer cache in ULTRIX is unknown to us..PPAnother subsystem that has a large interaction with the filesystemis the virtual memory system.The virtual memory system must read data from the filesystemto satisfy fill-on-demand page faults.For efficiency, this read call is arranged to place the data directlyinto the physical pages assigned to the process (a ``raw'' read) to avoidcopying the data.Although the read operation normally bypasses the filesystem buffer cache,consistency must be maintained by checking the buffer cache and copyingor flushing modified data not yet stored on disk.The 4.2BSD virtual memory system, like that of Sun and ULTRIX,maintains its own cache of reusable text pages.This creates additional complications.As the virtual memory systems are redesigned, these problems should beresolved by reading through the buffer cache, then mapping the cacheddata into the user address space.If the buffer cache or the process pages are changed while the other referenceremains, the data would have to be copied (``copy-on-write'')..PPIn the meantime, the current virtual memory systems must be usedwith the new filesystem framework.Both the Sun and AT&T filesystem interfacesprovide entry points to the filesystem for optimization of the virtualmemory system by performing logical-to-physical block number translationwhen setting up a fill-on-demand image for a process.The VFS provides a vnode operation analogous to the \fIbmap\fP function of the.UXfilesystem.Given a vnode and logical block number, it returns a vnode and block numberwhich may be read to obtain the data.If the filesystem is local, it returns the private vnode for the block deviceand the physical block number.As the \fIbmap\fP operations are all performed at one time, during processstartup, any indirect blocks for the file will remain in the cacheafter they are once read.In addition, the interface provides a \fIstrategy\fP entry that may be usedfor ``raw'' reads from a filesystem device,used to read data blocks into an address space without copying.This entry uses a buffer header (\fIbuf\fP structure)to describe the I/O operationinstead of a \fIuio\fP structure.The buffer-style interface is the same as that used by disk drivers internally.This difference allows the current \fIuio\fP primitives to be avoided,as they copy all data to/from the current user process address space.Instead, for local filesystems these operations could be done internallywith the standard raw disk read routines,which use a \fIuio\fP interface.When loading from a remote filesystems,the data will be received in a network buffer.If network buffers are suitably aligned,the data may be mapped into the process address space by a page swapwithout copying.In either case, it should be possible to use the standard filesystemread entry from the virtual memory system..PPOther issues that must be considered in devising a portablefilesystem implementation include kernel memory allocation,the implicit use of user-structure global context,which may create problems with reentrancy,the style of the system call interface,and the conventions for synchronization(sleep/wakeup, handling of interrupted system calls, semaphores)..SHThe Berkeley Proposal.PPThe Sun VFS interface has been most widely used of the three described here.It is also the most general of the three, in that filesystem-specificdata and operations are best separated from the generic layer.Although it has several disadvantages which were described above,most of them may be corrected with minor changes to the interface(and, in a few areas, philosophical changes).The DEC GFS has other advantages, in particular the use of the 4.3BSD\fInamei\fP interface and optimizations.It allows single or multiple components of a pathnameto be translated in a single call to the specific filesystemand thus accommodates filesystems with either preference.The FSS is least well understood, as there is little public informationabout the interface.However, the design goals are the least consistent with those of the Berkeleyresearch groups.Accordingly, a new filesystem interface has been devised to avoidsome of the problems in the other systems.The proposed interface derives directly from Sun's VFS,but, like GFS, uses a 4.3BSD-style name lookup interface.Additional context information has been moved from the \fIuser\fP structureto the \fInameidata\fP structure so that name translation may be independentof the global context of a user process.This is especially desired in any system where kernel-mode serversoperate as light-weight or interrupt-level processes,or where a server may store or cache context for several clients.This calling interface has the additional advantagethat the call parameters need not all be pushed onto the stack for each callthrough the filesystem interface,and they may be accessed using short offsets from a base pointer(unlike global variables in the \fIuser\fP structure)..PPThe proposed filesystem interface is described very tersely here.For the most part, data structures and procedures are analogousto those used by VFS, and only the changes will be be treated here.See [Kleiman86] for complete descriptions of the vfs and vnode operationsin Sun's interface..PPThe central data structure for name translation is the \fInameidata\fPstructure.The same structure is used to pass parameters to \fInamei\fP,to pass these same parameters to filesystem-specific lookup routines,to communicate completion status from the lookup routines back to \fInamei\fP,and to return completion status to the calling routine.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -