porttour1

来自「unix v7是最后一个广泛发布的研究型UNIX版本」· 代码 · 共 1,501 行 · 第 1/4 页
TXT
1,501 行
.TLA Tour Through the Portable C Compiler.AUS. C. Johnson.AI.MH.SHIntroduction.PPA C compiler has been implemented that has proved to be quite portable,serving as the basis for C compilers on roughly a dozen machines, includingthe Honeywell 6000, IBM 370, and Interdata 8/32.The compiler is highly compatible with the C language standard..[Kernighan Ritchie Prentice 1978.].PPAmong the goals of this compiler are portability, high reliability,and the use of state-of-the-art techniques and tools wherever practical.Although the efficiency of the compiling process is nota primary goal, the compiler is efficient enough, and producesgood enough code, to serve as a production compiler..PPThe language implemented is highly compatible with the current PDP-11 version of C.Moreover, roughly 75% of the compiler, includingnearly all the syntactic and semantic routines, is machine independent.The compiler also serves as the major portion of the program.I lint ,described elsewhere..[Johnson Lint Program Checker.].PPA number of earlier attempts to make portable compilers are worth noting.While on CO-OP assignment to Bell Labs in 1973, Alan Snyder wrote a portable Ccompiler which was the basis of his Master's Thesis at M.I.T..[Snyder Portable C Compiler.]This compiler was very slow and complicated, and contained a number of ratherserious implementation difficulties;nevertheless, a number of Snyder's ideas appear in this work..PPMost earlier portable compilers, including Snyder's, have proceeded bydefining an intermediate language, perhaps basedon three-address code or code for a stack machine,and writing a machine independent program totranslate from the source code to thisintermediate code.The intermediate code is then read by a second pass, and interpreted or compiled.This approach is elegant, and has a number of advantages, especially if thetarget machine is far removed from the host.It suffers from some disadvantages as well.Some constructions, like initialization and subroutine prologs,are difficult or expensive to express in amachine independent way that still allows themto be easily adapted to the target assemblers.Most of these approaches require a symbol table to be constructedin the second (machine dependent) pass, and/orrequire powerful target assemblers.Also, many conversion operators may be generatedthat have no effect on a given machine,but may be needed on others (for example, pointer to pointerconversions usually do nothing in C, but must be generated becausethere are some machines where they are significant)..PPFor these reasons, the first pass of the portable compileris not entirely machine independent.It contains some machine dependent features, such as initialization, subroutineprolog and epilog, certain storage allocation functions,code for the.I switchstatement, and code to throw out unneeded conversion operators..PPAs a crude measure of the degree ofportability actually achieved,the Interdata 8/32 C compiler hasroughly 600 machine dependent lines of source out of 4600 in Pass 1, and 1000out of 3400 in Pass 2.In total, 1600 out of 8000, or 20%,of the total source is machine dependent(12% in Pass 1, 30% in Pass 2).These percentages can be expected to rise slightly as thecompiler is tuned.The percentage of machine-dependent code for the IBM is 22%, forthe Honeywell 25%.If the assembler format and structure were the same for allthese machines, perhaps another 5-10% of the codewould become machine independent..PPThese figures are sufficiently misleading as to be almostmeaningless.A large fraction of the machine dependent code can be convertedin a straightforward, almost mechanical way.On the other hand, a certain amount of the code requres hardintellectual effort to convert, since the algorithmsembodied in this part of the code are typically complicatedand machine dependent..PPTo summarize, however, if you need a C compiler written for a machine witha reasonable architecture, the compiler is already three quarters finished!.SHOverview.PPThis paper discusses the structure and organization of the portable compiler.The intent is to give the big picture, rather than discussing the details of a particular machineimplementation.After a brief overview and a discussion of the source file structure,the paper describes the major data structures, and then delves moreclosely into the two passes.Some of the theoretical work on which the compiler is based, andits application to the compiler, is discussed elsewhere..[johnson portable theory practice.]One of the major design issues in any C compiler, the design ofthe calling sequence and stack frame, is the subject of a separatememorandum..[johnson lesk ritchie calling sequence.].PPThe compiler consists of two passes,.I pass1and.I pass2 ,that together turn C source code into assembler code for the targetmachine.The two passes are preceded by a preprocessor, thathandles the.B "#define"and.B "#include"statements, and related features (e.g.,.B #ifdef ,etc.).It is a nearly machine independent program, and willnot be further discussed here..PPThe output of the preprocessor is a text file that is read as the standardinput of the first pass.Thisproduces as standard output another text file that becomes the standardinput of the second pass.The second pass produces, as standard output, the desired assembler language source code.The preprocessor and the two passesall write error messages on the standard error file.Thus the compiler itself makes few demands on the I/Olibrary support, aiding in the bootstrapping process..PPAlthough the compiler is divided into two passes,this represents historical accident more than deep necessity.In fact, the compiler can optionally be loadedso that both passes operate in the same program.This ``one pass'' operation eliminates the overhead ofreading and writing the intermediate file, so the compileroperates about 30% faster in this mode.It also occupies about 30% more space than the largerof the two component passes..PPBecause the compiler is fundamentally structured as twopasses, even when loaded as one, this document primarilydescribes the two pass version..PPThe first pass does the lexical analysis, parsing, andsymbol table maintenance.It also constructs parse trees for expressions,and keeps track of the types of the nodes in these trees.Additional code is devoted to initialization.Machine dependent portions of the first pass serve togenerate subroutine prologs and epilogs,code for switches, and code for branches, label definitions,alignment operations,changes of location counter, etc..PPThe intermediate file is a text file organized into lines.Lines beginning with a right parenthesis are copied by the secondpass directly to its output file, with theparenthesis stripped off.Thus, when the first pass produces assembly code, such as subroutine prologs,etc., each line is prefaced with a right parenthesis;the second pass passes these lines to through to the assembler..PPThe major job done by the second pass is generation of codefor expressions.The expression parse trees produced in the first pass arewritten onto theintermediate file in Polish Prefix form:first, there is a line beginning with a period, followed by the sourcefile line number and name on which the expression appeared (for debugging purposes).The successive lines represent the nodes of the parse tree,one node per line.Each line contains the node number, type, andany values (e.g., values of constants)that may appear in the node.Lines representing nodes with descendants are immediatelyfollowed by the left subtree of descendants, then theright.Since the number of descendants of any node is completelydetermined by the node number, there is no need to mark the endof the tree..PPThere are only two other line types in the intermediate file.Lines beginning with a left square bracket (`[') represent the beginning ofblocks (delimited by { ... } in the C source);lines beginning with right square brackets (`]') represent the end of blocks.The remainder of these lines tell how muchstack space, and how many register variables,are currently in use..PPThus, the second pass reads the intermediate files, copies the `)' lines,makes note of the information in the `[' and `]' lines,and devotes most of its effort to the`.' lines and their associated expression trees, turning themturns into assembly code to evaluate the expressions..PPIn the one pass version of the compiler, the expressiontrees that are built by the first pass havebeen declared to have room for the second passinformation as well.Instead of writing the trees onto an intermediate file,each tree is transformed in place into an acceptableform for the code generator.The code generator then writes the result of compilingthis tree onto the standard output.Instead of `[' and `]' lines in the intermediate file, the information is passeddirectly to the second pass routines.Assembly code produced by the first pass is simply written out,without the need for ')' at the head of each line..SHThe Source Files.PPThe compiler source consists of 22 source files.Two files,.I manifestand.I macdefs ,are header files included with all other files..I Manifesthas declarations for the node numbers, types, storage classes,and other global data definitions..I Macdefshas machine-dependent definitions, such as the size and alignment of thevarious data representations.Two machine independent header files,.I mfile1and.I mfile2 ,contain the data structure and manifest definitionsfor the first and second passes, respectively.In the second pass, a machine dependent header file,.I mac2defs ,contains declarations of register names, etc..PPThere is a file,.I common ,containing (machine independent) routines used in both passes.These include routines for allocating and freeing trees, walking over trees,printing debugging information, and printing error messages.There are two dummy files,.I comm1.cand.I comm2.c ,that simply include.I commonwithin the scope of the appropriate pass1 or pass2 header files.When the compiler is loaded as a single pass,.I commononly needs to be included once:.I comm2.cis not needed..PPEntire sections of this document are devoted to the detailed structure of thepasses.For the moment, we just give a brief description of the files.The first passis obtained by compiling and loading.I scan.c ,.I cgram.c ,.I xdefs.c ,.I pftn.c ,.I trees.c ,.I optim.c ,.I local.c ,.I code.c ,and.I comm1.c ..I Scan.cis the lexical analyzer, which is used by.I cgram.c ,the result of applying.I Yacc.[Johnson Yacc Compiler cstr.]to the input grammar.I cgram.y ..I Xdefs.cis a short file of external definitions..I Pftn.cmaintains the symbol table, and does initialization..I Trees.cbuilds the expression trees, and computes the node types..I Optim.cdoes some machine independent optimizations on the expression trees..I Comm1.cincludes.I common ,that contains service routines common to the two passes of the compiler.All the above files are machine independent.The files.I local.cand.I code.ccontain machine dependent code for generating subroutineprologs, switch code, and the like..PPThe second passis produced by compiling and loading.I reader.c ,.I allo.c ,.I match.c ,.I comm1.c ,.I order.c ,.I local.c ,and.I table.c ..I Reader.creads the intermediate file, and controls the major logic of thecode generation..I Allo.ckeeps track of busy and free registers..I Match.ccontrols the matchingof code templates to subtrees of the expression tree to be compiled..I Comm2.cincludes the file.I common ,as in the first pass.The above files are machine independent..I Order.ccontrols the machine dependent details of the code generation strategy..I Local2.chas many small machine dependent routines,and tables of opcodes, register types, etc..I Table.chas the code template tables,which are also clearly machine dependent..SHData Structure Considerations..PPThis section discusses the node numbers, type words, and expression trees, usedthroughout both passes of the compiler..PPThe file.I manifestdefines those symbols used throughout both passes.The intent is to use the same symbol name (e.g., MINUS)for the given operator throughout the lexical analysis, parsing, tree building,and code generation phases;this requires some synchronization with the.I Yaccinput file,.I cgram.y ,as well..PPA tokenlike MINUS may be seen in the lexical analyzer before it is knownwhether it is a unary or binary operator;clearly, it is necessary to know this by the time the parse treeis constructed.Thus, an operator (really a macro) calledUNARY is provided, so that MINUSand UNARY MINUS are both distinct node numbers.Similarly, many binary operators exist in an assignment form (for example, \-=),and the operator ASG may be applied to such node names to generate new ones, e.g. ASG MINUS.
porttour1 - 源码说明

本页面展示了「unix v7是最后一个广泛发布的研究型UNIX版本」中的 porttour1 源码文件，采用编程语言编写，共 1,501 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与unix相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?