📄 4.t
字号:
.KE.PPOn our general time sharing systems we find that during the twelvehour period from 8AM to 8PM the system does 500,000 to 1,000,000name translations.Statistics on the performance of both caches show thatthe large performance improvement iscaused by the high hit ratio.The name cache has a hit rate of 70%-80%;the directory offset cache gets a hit rate of 5%-15%.The combined hit rate of the two caches almost always adds up to 85%.With the addition of the two caches,the percentage of system time devoted to name translation hasdropped from 25% to less than 13%.While the system wide cache reduces both the amount of time inthe routines that \fInamei\fP calls as well as \fInamei\fP itself(since fewer directories need to be accessed or searched),it is interesting to note that the actual percentage of systemtime spent in \fInamei\fP itself increases even though theactual time per call decreases.This is because less total time is being spent in the kernel,hence a smaller absolute time becomes a larger total percentage..NH 3Intelligent Auto Siloing.PPMost terminal input hardware can run in two modes:it can either generate an interrupt each time a character is received,or collect characters in a silo that the system then periodically drains.To provide quick response for interactive input and flow control,a silo must be checked 30 to 50 times per second.Ascii terminals normally exhibitan input rate of less than 30 characters per second.At this input ratethey are most efficiently handled with interrupt per character mode,since this generates fewer interrupts than draining the input silosof the terminal multiplexors at each clock interrupt.When input is being generated by another machineor a malfunctioning terminal connection, however,the input rate is usually more than 50 characters per second.It is more efficient to use a device's silo input mode,since this generates fewer interrupts than handling each characteras a separate interrupt.Since a given dialup port may switch between uucp logins and user logins,it is impossible to statically select the most efficient input mode to use..PPWe therefore changed the terminal multiplexor handlersto dynamically choose between the use of the silo and the use ofper-character interrupts. At low input rates the handler processes characters on aninterrupt basis, avoiding the overheadof checking each interface on each clock interrupt.During periods of sustained input, the handler enables the siloand starts a timer to drain input.This timer runs less frequently than the clock interrupts,and is used only when there is a substantial amount of input.The transition from using silos to an interrupt per character isdamped to minimize the number of transitions with bursty traffic(such as in network communication).Input characters serve to flush the silo, preventing long latency.By switching between these two modes of operation dynamically,the overhead of checking the silos is incurred onlywhen necessary..PPIn addition to the savings in the terminal handlers,the clock interrupt routine is no longer required to schedulea software interrupt after each hardware interrupt to drain the silos.The software-interrupt level portion of the clock routine is onlyneeded when timers expire or the current user process is collectingan execution profile.Thus, the number of interrupts attributable to clock processingis substantially reduced..NH 3Process Table Management.PPAs systems have grown larger, the size of the process tablehas grown far past 200 entries.With large tables, linear searches must be eliminatedfrom any frequently used facility.The kernel process table is now multi-threaded to allow selective searchingof active and zombie processes.A third list threads unused process table slots.Free slots can be obtained in constant time by taking onefrom the front of the free list.The number of processes used by a given user may be computed by scanningonly the active list.Since the 4.2BSD release,the kernel maintained linked lists of the descendents of each process.This linkage is now exploited when dealing with process exit;parents seeking the exit status of children now avoid linear searchof the process table, but examine only their direct descendents.In addition, the previous algorithm for finding all descendents of an exitingprocess used multiple linear scans of the process table.This has been changed to follow the links between child process and siblings..PPWhen forking a new process,the system must assign it a unique process identifier.The system previously scanned the entire process table each time it createda new process to locate an identifier that was not already in use.Now, to avoid scanning the process table for each new process,the system computes a range of unused identifiersthat can be directly assigned.Only when the set of identifiers is exhausted is another process tablescan required..NH 3Scheduling.PPPreviously the scheduler scanned the entire process tableonce per second to recompute process priorities.Processes that had run for their entire time slice had theirpriority lowered.Processes that had not used their time slice, or that hadbeen sleeping for the past second had their priority raised.On systems running many processes,the scheduler represented nearly 20% of the system time.To reduce this overhead,the scheduler has been changed to consider onlyrunnable processes when recomputing priorities.To insure that processes sleeping for more than a secondstill get their appropriate priority boost,their priority is recomputed when they are placed back on the run queue.Since the set of runnable process is typically only a small fractionof the total number of processes on the system,the cost of invoking the scheduler drops proportionally..NH 3Clock Handling.PPThe hardware clock interrupts the processor 100 times per secondat high priority.As most of the clock-based events need not be done at high priority,the system schedules a lower priority software interrupt to do the lesstime-critical events such as cpu scheduling and timeout processing.Often there are no such events, and the software interrupt handlerfinds nothing to do and returns. The high priority event now checks to see if there are low priorityevents to process;if there is nothing to do, the software interrupt is not requested.Often, the high priority interrupt occurs during a period when themachine had been running at low priority.Rather than posting a software interrupt that would occur assoon as it returns,the hardware clock interrupt handler simply lowers the processor priorityand calls the software clock routines directly.Between these two optimizations, nearly 80 of the 100 softwareinterrupts per second can be eliminated..NH 3File System.PPThe file system uses a large block size, typically 4096 or 8192 bytes.To allow small files to be stored efficiently, the large blocks canbe broken into smaller fragments, typically multiples of 1024 bytes.To minimize the number of full-sized blocks that must be brokeninto fragments, the file system uses a best fit strategy.Programs that slowly grow files using write of 1024 bytes or lesscan force the file system to copy the data tosuccessively larger and larger fragments until it finallygrows to a full sized block.The file system still uses a best fit strategy the first timea fragment is written.However, the first time that the file system is forced to copy a growingfragment it places it at the beginning of a full sized block.Continued growth can be accommodated without further copyingby using up the rest of the block.If the file ceases to grow, the rest of the block is stillavailable for holding other fragments..PPWhen creating a new file name,the entire directory in which it will reside must be scannedto insure that the name does not already exist.For large directories, this scan is time consuming.Because there was no provision for shortening directories,a directory that is once over-filled will increase the costof file creation even after the over-filling is corrected.Thus, for example, a congested uucp connection can leave a legacy longafter it is cleared up.To alleviate the problem, the system now deletes empty blocksthat it finds at the end of a directory while doing a completescan to create a new name..NH 3Network.PPThe default amount of buffer space allocated for stream sockets (includingpipes) has been increased to 4096 bytes.Stream sockets and pipes now return their buffer sizes in the block size fieldof the stat structure.This information allows the standard I/O library to use more optimal buffering.Unix domain stream sockets also return a dummy device and inode numberin the stat structure to increase compatibilitywith other pipe implementations.The TCP maximum segment size is calculated according to the destinationand interface in use; non-local connections use a more conservative sizefor long-haul networks..PPOn multiply-homed hosts, the local address bound by TCP now always correspondsto the interface that will be used in transmitting data packets for theconnection.Several bugs in the calculation of round trip timing have been corrected.TCP now switches to an alternate gateway when an existing route fails,or when an ICMP redirect message is received.ICMP source quench messages are used to throttle the transmissionrate of TCP streams by temporarily creating an artificially smallsend window, and retransmissions send only a single packetrather than resending all queued data.A send policy has been implementedthat decreases the number of small packets outstandingfor network terminal traffic [Nagle84],providing additional reduction of network congestion.The overhead of packet routing has been decreased by changes in the routingcode and by cacheing the most recently used route for each datagram socket..PPThe buffer management strategy implemented by \fIsosend\fP has beenchanged to make better use of the increased size of the socket buffersand a better tuned delayed acknowledgement algorithm.Routing has been modified to include a one element cache of the lastroute computed.Multiple messages send with the same destination now require less processing.Performance deteriorates because of load ineither the sender host, receiver host, or ether.Also, any CPU contention degrades substantiallythe throughput achievable by user processes [Cabrera85].We have observed empty VAX 11/750s using up to 90% of their cyclestransmitting network messages..NH 3Exec.PPWhen \fIexec\fP-ing a new process, the kernel creates the newprogram's argument list by copying the arguments and environmentfrom the parent process's address space into the system, then back outagain onto the stack of the newly created process.These two copy operations were done one byte at a time, butare now done a string at a time.This optimization reduced the time to processan argument list by a factor of ten;the average time to do an \fIexec\fP call decreased by 25%..NH 3Context Switching.PPThe kernel used to post a software event when it wanted to forcea process to be rescheduled.Often the process would be rescheduled for other reasons beforeexiting the kernel, delaying the event trap.At some later time the process would againbe selected to run and would complete its pending system call,finally causing the event to take place.The event would cause the scheduler to be invoked a second timeselecting the same process to run.The fix to this problem is to cancel any software rescheduleevents when saving a process context.This change doubles the speed with which processescan synchronize using pipes or signals..NH 3Setjmp/Longjmp.PPThe kernel routine \fIsetjmp\fP, that saves the current systemcontext in preparation for a non-local goto used to save many moreregisters than necessary under most circumstances.By trimming its operation to save only the minimum state required,the overhead for system calls decreased by an average of 13%..NH 3Compensating for Lack of Compiler Technology
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -