📄 3.t
字号:
total 24.2 ms/call 19.2%.TE.ceTable 4. Call times for \fInamei\fP in 4.2BSD..DE.KE.NH 3Clock processing.PPNearly 25% of the time spent in the kernel is spent in the clockprocessing routines.(This is a clear indication that to avoid sampling bias when profiling thekernel with our toolswe need to drive them from an independent clock.)These routines are responsible for implementing timeouts,scheduling the processor,maintaining kernel statistics,and tending various hardware operations such asdraining the terminal input silos.Only minimal work is done in the hardware clock interruptroutine (at high priority), the rest is performed (at a lower priority)in a software interrupt handler scheduled by the hardware interrupthandler.In the worst case, with a clock rate of 100 Hzand with every hardware interrupt scheduling a softwareinterrupt, the processor must field 200 interrupts per second.The overhead of simply trapping and returningis 3% of the machine cycles,figuring out that there is nothing to dorequires an additional 2%..NH 3Terminal multiplexors.PPThe terminal multiplexors supported by 4.2BSD have programmable receiversilos that may be used in two ways.With the silo disabled, each character received causes an interruptto the processor.Enabling the receiver silo allows the silo to fill beforegenerating an interrupt, allowing multiple characters to be readfor each interrupt.At low rates of input, received characters will not be processedfor some time unless the silo is emptied periodically.The 4.2BSD kernel uses the input silos of each terminal multiplexor,and empties each silo on each clock interrupt.This allows high input rates without the cost of per-character interruptswhile assuring low latency.However, as character input rates on most machines are usuallylow (about 25 characters per second),this can result in excessive overhead.At the current clock rate of 100 Hz, a machine with 5 terminal multiplexorsconfigured makes 500 calls to the receiver interrupt routines per second.In addition, to achieve acceptable input latencyfor flow control, each clock interrupt must schedulea software interrupt to run the silo draining routines.\**.FS\** It is not possible to check the input silos atthe time of the actual clock interrupt without modifying the terminalline disciplines, as the input queues may not be in a consistent state \**..FE\** This implies that the worst case estimate for clock processingis the basic overhead for clock processing..NH 3Process table management.PPIn 4.2BSD there are numerous places in the kernel where a linear searchof the process table is performed: .IP \(bu 3in \fIexit\fP to locate and wakeup a process's parent;.IP \(bu 3in \fIwait\fP when searching for \fB\s-2ZOMBIE\s+2\fP and\fB\s-2STOPPED\s+2\fP processes;.IP \(bu 3in \fIfork\fP when allocating a new process table slot andcounting the number of processes already created by a user;.IP \(bu 3in \fInewproc\fP, to verifythat a process id assigned to a new process is not currentlyin use;.IP \(bu 3in \fIkill\fP and \fIgsignal\fP to locate all processes towhich a signal should be delivered;.IP \(bu 3in \fIschedcpu\fP when adjusting the process priorities everysecond; and.IP \(bu 3in \fIsched\fP when locating a process to swap out and/or swapin..LPThese linear searches can incur significant overhead. The rulefor calculating the size of the process table is:.cenproc = 20 + 8 * maxusers.spthat means a 48 user system will have a 404 slot process table.With the addition of network services in 4.2BSD, as many as a dozenserver processes may be maintained simply to await incoming requests.These servers are normally created at boot time which causes themto be allocated slots near the beginning of the process table. Thismeans that process table searches under 4.2BSD are likely to takesignificantly longer than under 4.1BSD. System profiling showsthat as much as 20% of the time spent in the kernel on a loadedsystem (a VAX-11/780) can be spent in \fIschedcpu\fP and, on average,5-10% of the kernel time is spent in \fIschedcpu\fP.The other searches of the proc table are similarly affected.This shows the system can no longer tolerate using linear searches ofthe process table..NH 3File system buffer cache.PPThe trace facilities described in section 2.3 were usedto gather statistics on the performance of the buffer cache.We were interested in measuring the effectiveness of thecache and the read-ahead policies.With the file system block size in 4.2BSD four toeight times that of a 4.1BSD file system, we were concernedthat large amounts of read-ahead might be performed withoutbeing used. Also, we were interested in seeing if therules used to size the buffer cache at boot time were severelyaffecting the overall cache operation..PPThe tracing package was run over a three hour period duringa peak mid-afternoon period on a VAX 11/780 with four megabytesof physical memory.This resulted in a buffer cache containing 400 kilobytes of memoryspread among 50 to 200 buffers(the actual number of buffers depends on the size mix ofdisk blocks being read at any given time).The pertinent configuration information is shown in Table 5..KF.DS L.TScenter box;l l l l.Controller Drive Device File System_DEC MASSBUS DEC RP06 hp0d /usr hp0b swapEmulex SC780 Fujitsu Eagle hp1a /usr/spool/news hp1b swap hp1e /usr/src hp1d /u0 (users) Fujitsu Eagle hp2a /tmp hp2b swap hp2d /u1 (users) Fujitsu Eagle hp3a /.TE.ceTable 5. Active file systems during buffer cache tests..DE.KE.PPDuring the test period the load average ranged from 2 to 13with an average of 5.The system had no idle time, 43% user time, and 57% system time.The system averaged 90 interrupts per second (excluding the system clock interrupts),220 system calls per second,and 50 context switches per second (40 voluntary, 10 involuntary)..PPThe active virtual memory (the sum of the address space sizes ofall jobs that have run in the previous twenty seconds)over the period ranged from 2 to 6 megabytes with an averageof 3.5 megabytes.There was no swapping, though the page daemon was inspectingabout 25 pages per second..PPOn average 250 requests to read disk blocks were initiatedper second.These include read requests for file blocks made by user programs as well as requests initiated by the system.System reads include requests for indexing information to determinewhere a file's next data block resides,file system layout maps to allocate new data blocks,and requests for directory contents needed to do path name translations..PPOn average, an 85% cache hit rate was observed for read requests.Thus only 37 disk reads were initiated per second.In addition, 5 read-ahead requests were made each secondfilling about 20% of the buffer pool.Despite the policies to rapidly reuse read-ahead buffersthat remain unclaimed, more than 90% of the read-aheadbuffers were used..PPThese measurements showed that the buffer cache was workingeffectively. Independent tests have also showed that the sizeof the buffer cache may be reduced significantly on memory-poorsystem without severe effects;we have not yet tested this hypothesis [Shannon83]..NH 3Network subsystem.PPThe overhead associated with the network facilities found in 4.2BSD is oftendifficult to gauge without profiling the system.This is because most input processing is performedin modules scheduled with software interrupts.As a result, the system time spent performing protocolprocessing is rarely attributed to the processes thatreally receive the data. Since the protocols supportedby 4.2BSD can involve significant overhead this was a seriousconcern. Results from a profiled kernel show an averageof 5% of the system time is spentperforming network input and timer processing in our environment(a 3Mb/s Ethernet with most traffic using TCP).This figure can vary significantly depending onthe network hardware used, the average messagesize, and whether packet reassembly is required at the networklayer. On one machine we profiled over a 17 hourperiod (our gateway to the ARPANET)206,000 input messages accounted for 2.4% of the system time,while another 0.6% of the system time was spent performingprotocol timer processing.This machine was configured with an ACC LH/DH IMP interfaceand a DMA 3Mb/s Ethernet controller..PPThe performance of TCP over slower long-haul networkswas degraded substantially by two problems.The first problem was a bug that prevented round-trip timing measurementsfrom being made, thus increasing retransmissions unnecessarily.The second was a problem with the maximum segment size chosen by TCP,that was well-tuned for Ethernet, but was poorly chosen forthe ARPANET, where it causes packet fragmentation. (The maximumsegment size was actually negotiated upwards to a value thatresulted in excessive fragmentation.).PPWhen benchmarked in Ethernet environments the main memory buffer managementof the network subsystem presented some performance anomalies.The overhead of processing small ``mbufs'' severely affected throughput for asubstantial range of message sizes.In spite of the fact that most system ustilities made use of the throughputoptimal 1024 byte size, user processes faced large degradations for somearbitrary sizes. This was specially true for TCP/IP transmissions [Cabrera84,Cabrera85]..NH 3Virtual memory subsystem.PPWe ran a set of tests intended to exercise the virtualmemory system under both 4.1BSD and 4.2BSD.The tests are described in Table 6.The test programs dynamically allocateda 7.3 Megabyte array (using \fIsbrk\fP\|(2)) then referencedpages in the array either: sequentially, in a purely randomfashion, or such that the distance betweensuccessive pages accessed was randomly selected from a Gaussiandistribution. In the last case, successive runs were made withincreasing standard deviations..KF.DS L.TScenter box;l | l.Test Description_seqpage sequentially touch pages, 10 iterationsseqpage-v as above, but first make \fIvadvise\fP\|(2) callrandpage touch random page 30,000 timesrandpage-v as above, but first make \fIvadvise\fP callgausspage.1 30,000 Gaussian accesses, standard deviation of 1gausspage.10 as above, standard deviation of 10gausspage.30 as above, standard deviation of 30gausspage.40 as above, standard deviation of 40gausspage.50 as above, standard deviation of 50gausspage.60 as above, standard deviation of 60gausspage.80 as above, standard deviation of 80gausspage.inf as above, standard deviation of 10,000.TE.ceTable 6. Paging benchmark programs..DE.KE.PPThe results in Table 7 show how the additionalmemory requirementsof 4.2BSD can generate more work for the paging system.Under 4.1BSD,the system used 0.5 of the 4.5 megabytes of physical memoryon the test machine;under 4.2BSD it used nearly 1 megabyte of physical memory.\**.FS\** The 4.1BSD system used for testing was really a 4.1a system configuredwith networking facilities and code to supportremote file access. The4.2BSD system also included the remote file access code.Since bothsystems would be larger than similarly configured ``vanilla''4.1BSD or 4.2BSD system, we consider out conclusions to still be valid..FEThis resulted in more page faults and, hence, more system time.To establish a common ground on which to compare the pagingroutines of each system, we check instead the average page faultservice times for those test runs that had a statistically significantnumber of random page faults. These figures, shown in Table 8, showno significant difference between the two systems inthe area of page fault servicing. We currently haveno explanation for the results of the sequentialpaging tests..KF.DS L.TScenter box;l || c s || c s || c s || c sl || c s || c s || c s || c sl || c | c || c | c || c | c || c | cl || n | n || n | n || n | n || n | n.Test Real User System Page Faults\^ _ _ _ _\^ 4.1 4.2 4.1 4.2 4.1 4.2 4.1 4.2=seqpage 959 1126 16.7 12.8 197.0 213.0 17132 17113seqpage-v 579 812 3.8 5.3 216.0 237.7 8394 8351randpage 571 569 6.7 7.6 64.0 77.2 8085 9776randpage-v 572 562 6.1 7.3 62.2 77.5 8126 9852gausspage.1 25 24 23.6 23.8 0.8 0.8 8 8gausspage.10 26 26 22.7 23.0 3.2 3.6 2 2gausspage.30 34 33 25.0 24.8 8.6 8.9 2 2gausspage.40 42 81 23.9 25.0 11.5 13.6 3 260gausspage.50 113 175 24.2 26.2 19.6 26.3 784 1851gausspage.60 191 234 27.6 26.7 27.4 36.0 2067 3177gausspage.80 312 329 28.0 27.9 41.5 52.0 3933 5105gausspage.inf 619 621 82.9 85.6 68.3 81.5 8046 9650.TE.ceTable 7. Paging benchmark results (all times in seconds)..DE.KE.KF.DS L.TScenter box;c || c s || c sc || c s || c sc || c | c || c | cl || n | n || n | n.Test Page Faults PFST\^ _ _\^ 4.1 4.2 4.1 4.2=randpage 8085 9776 791 789randpage-v 8126 9852 765 786gausspage.inf 8046 9650 848 844.TE.ceTable 8. Page fault service times (all times in microseconds)..DE.KE
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -