📄 4.t

📁 早期freebsd实现
💻 T
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
.PPThe current compilers available for C do notdo any significant optimization.Good optimizing compilers are unlikely to be built;the C language is not well suited to optimizationbecause of its rampant use of unbound pointers.Thus, many classical optimizations such as common subexpressionanalysis and selection of register variables must be doneby hand using ``exterior'' knowledge of when such optimizations are safe..PPAnother optimization usually done by optimizing compilersis inline expansion of small or frequently used routines.In past Berkeley systems this has been done by using \fIsed\fP torun over the assembly language and replace calls to smallroutines with the code for the body of the routine, oftena single VAX instruction.While this optimization eliminated the cost of the subroutinecall and return, it did not eliminate the pushing and popping of several argumentsto the routine.The \fIsed\fP script has been replaced by a more intelligent expander,\fIinline\fP, that merges the pushes and pops into moves to registers.For example, if the C code.DSif (scanc(map[i], 1, 47, i - 63)).DEis compiled into assembly language it generates the code shownin the left hand column of Table 11.The \fIsed\fP inline expander changes this code to thatshown in the middle column.The newer optimizer eliminates most of the stackoperations to generate the code shown in the right hand column..KF.TScenter, box;c s s s s sc s | c s | c sl l | l l | l l.Alternative C Language Code Optimizations_cc	sed	inline_subl3	$64,_i,\-(sp)	subl3	$64,_i,\-(sp)	subl3	$64,_i,r5pushl	$47	pushl	$47	movl	$47,r4pushl	$1	pushl	$1	pushl	$1mull2	$16,_i,r3	mull2	$16,_i,r3	mull2	$16,_i,r3pushl	\-56(fp)[r3]	pushl	\-56(fp)[r3]	movl	\-56(fp)[r3],r2calls	$4,_scanc	movl	(sp)+,r5	movl	(sp)+,r3tstl	r0	movl	(sp)+,r4	scanc	r2,(r3),(r4),r5jeql	L7	movl	(sp)+,r3	tstl	r0		movl	(sp)+,r2	jeql	L7		scanc	r2,(r3),(r4),r5		tstl	r0		jeql	L7.TE.ceTable 11. Alternative inline code expansions..KE.PPAnother optimization involved reevaluatingexisting data structures in the context of the current system.For example, disk buffer hashing was implemented when the systemtypically had thirty to fifty buffers.Most systems today have 200 to 1000 buffers.Consequently, most of the hash chains containedten to a hundred buffers each!The running time of the low level buffer management primitives wasdramatically improved simply by enlarging the size of the hash table..NH 2Improvements to Libraries and Utilities.PPIntuitively, changes to the kernel would seem to have the greatest payoff since they affect all programs that run on the system.However, the kernel has been tuned many times before, so theopportunity for significant improvement was small.By contrast, many of the libraries and utilities had never been tuned.For example, we found utilities that spent 90% of theirrunning time doing single character read system calls.Changing the utility to use the standard I/O library cut therunning time by a factor of five!Thus, while most of our time has been spent tuning the kernel,more than half of the speedups are because of improvements inother parts of the system.Some of the more dramatic changes are described in the followingsubsections..NH 3Hashed Databases.PPUNIX provides a set of database management routines, \fIdbm\fP,that can be used to speed lookups in large data fileswith an external hashed index file.The original version of dbm was designed to work with only onedatabase at a time.  These routines were generalized to handlemultiple database files, enabling them to be used in rewritesof the password and host file lookup routines.  The new routinesused to access the password file significantly improve the runningtime of many important programs such as the mail subsystem,the C-shell (in doing tilde expansion), \fIls \-l\fP, etc..NH 3Buffered I/O.PPThe new filesystem with its larger block sizes allows betterperformance, but it is possible to degrade system performanceby performing numerous small transfers rather than usingappropriately-sized buffers.The standard I/O libraryautomatically determines the optimal buffer size for each file.Some C library routines and commonly-used programs use low-levelI/O or their own buffering, however.Several important utilities that did not use the standard I/O libraryand were buffering I/O using the old optimal buffer size,1Kbytes; the programs were changed to buffer I/O according to theoptimal file system blocksize.These include the editor, the assembler, loader, remote file copy,the text formatting programs, and the C compiler..PPThe standard error output has traditionally been unbufferedto prevent delay in presenting the output to the user,and to prevent it from being lost if buffers are not flushed.The inordinate expense of sending single-byte packets throughthe network led us to impose a buffering scheme on the standarderror stream.Within a single call to \fIfprintf\fP, all output is buffered temporarily.Before the call returns, all output is flushed and the stream is againmarked unbuffered.As before, the normal block or line buffering mechanisms can be usedinstead of the default behavior..PPIt is possible for programs with good intentions to unintentionallydefeat the standard I/O library's choice of I/O buffer size by usingthe \fIsetbuf\fP call to assign an output buffer.Because of portability requirements, the default buffer size providedby \fIsetbuf\fP is 1024 bytes; this can lead, once again, to addedoverhead.One such program with this problem was \fIcat\fP;there are undoubtedly other standard system utilities with similar problemsas the system has changed much since they were originally written..NH 3Mail System.PPThe problems discussed in section 3.1.1 prompted significant workon the entire mail system.  The first problem identified was a bugin the \fIsyslog\fP program.  The mail delivery program, \fIsendmail\fPlogs all mail transactions through this process with the 4.2BSD interprocesscommunication facilities.  \fISyslog\fP then records the information ina log file.  Unfortunately, \fIsyslog\fP was performing a \fIsync\fP operation after each message it received, whether it was logged to a fileor not.  This wreaked havoc on the effectiveness of thebuffer cache and explained, to a largeextent, why sending mail to large distribution lists generated such aheavy load on the system (one syslog message was generated for eachmessage recipient causing almost a continuous sequence of sync operations)..PPThe hashed data base files wereinstalled in all mail programs, resulting in a order of magnitudespeedup on large distribution lists.  The code in \fI/bin/mail\fPthat notifies the \fIcomsat\fP program when mail has been delivered toa user was changed to cache host table lookups, resulting in a similarspeedup on large distribution lists. .PPNext, the file locking facilitiesprovided in 4.2BSD, \fIflock\fP\|(2), were used in place of the oldlocking mechanism. The mail system previously used \fIlink\fP and \fIunlink\fP inimplementing file locking primitives. Because these operations usually modify the contents of directoriesthey require synchronous disk operations and cannot takeadvantage of the name cache maintained by the system.Unlink requires that the entry be found in the directory so thatit can be removed; link requires that the directory be scanned to insure that the namedoes not already exist.By contrast the advisory locking facility in 4.2BSD isefficient because it is all done with in-memory tables.Thus, the mail system was modified to use the file locking primitives.This yielded another 10% cut in the basic overhead of delivering mail.Extensive profiling and tuning of \fIsendmail\fP andcompiling it without debugging code reduced the overhead by another 20%..NH 3Network Servers.PPWith the introduction of the network facilities in 4.2BSD,a myriad of services became available, each of which required its own daemon process.Many of these daemons were rarely if ever used,yet they lay asleep in the process table consumingsystem resources and generally slowing down response.Rather than having many servers started at boot time, a single server,\fIinetd\fP was substituted.This process reads a simple configuration filethat specifies the services the system is willing to supportand listens for service requests on each service's Internet port.When a client requests service the appropriate server is createdand passed a service connection as its standard input.  Serversthat require the identity of their client may use the \fIgetpeername\fPsystem call; likewise \fIgetsockname\fP may be used to find outa server's local address without consulting data base files.This scheme is attractive for several reasons:.IP \(bu 3it eliminatesas many as a dozen processes, easing system overhead andallowing the file and text tables to be made smaller,.IP \(bu 3servers need not contain the code required to handle connectionqueueing, simplifying the programs, and.IP \(bu 3installing and replacing servers becomes simpler..PPWith an increased numbers of networks, both local and external to Berkeley,we found that the overhead of the routing process was becominginordinately high.Several changes were made in the routing daemon to reduce this load.Routes to external networks are no longer exchanged by routerson the internal machines, only a route to a default gateway.This reduces the amount of network traffic and the time requiredto process routing messages.In addition, the routing daemon was profiledand functions responsible for large amountsof time were optimized.The major changes were a faster hashing scheme,and inline expansions of the ubiquitous byte-swapping functions..PPUnder certain circumstances, when output was blocked,attempts by the remote login processto send output to the user were rejected by the system,although a prior \fIselect\fP call had indicated that data could be sent.This resulted in continuous attempts to write the data until the remoteuser restarted output.This problem was initially avoided in the remote login handler,and the original problem in the kernel has since been corrected..NH 3The C Run-time Library.PPSeveral people have found poorly tuned codein frequently used routines in the C library [Lankford84].In particular the running time of the string routines can becut in half by rewriting them using the VAX string instructions.The memory allocation routines have been tuned to waste lessmemory for memory allocations with sizes that are a power of two.Certain library routines that did file input in one-character readshave been corrected.Other library routines including \fIfread\fP and \fIfwrite\fPhave been rewritten for efficiency..NH 3Csh.PPThe C-shell was converted to run on 4.2BSD bywriting a set of routines to simulate the old jobs library.While this provided a functioning C-shell,it was grossly inefficient, generating upto twenty system calls per prompt.The C-shell has been modified to use the new signalfacilities directly,cutting the number of system calls per prompt in half.Additional tuning was done with the help of profilingto cut the cost of frequently used facilities.
上一页 1 23
💿 文件大小 40554 K
👤 上传用户 luyibo54618
📂 所属分类 Linux/Unix编程
📄 代码行数 775 行
💻 语言类型 T
🏷️ 相关标签

#freebsd
更多freebsd资源 →
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -