📄 porttour1
字号:
.PPIt is frequently desirable to know if a node represents a leaf (no descendants), a unary operator (onedescendant) or a binary operator (two descendants).The macro.I optype(o)returns one of the manifest constants LTYPE, UTYPE, or BITYPE, respectively, dependingon the node number.I o .Similarly,.I asgop(o)returns true if.I ois an assignment operator number (=, +=, etc. ), and.I logop(o)returns true if.I ois a relational or logical (&&, \(or\(or, or !) operator..PPC has a rich typing structure, with a potentially infinite number of types.To begin with, there are the basic types: CHAR, SHORT, INT, LONG, the unsigned versionsknown asUCHAR, USHORT, UNSIGNED, ULONG, and FLOAT, DOUBLE,and finallySTRTY (a structure), UNIONTY, and ENUMTY.Then, there are three operators that can be applied to types to make others:if.I tis a type, we may potentially have types.I "pointer to t" ,.I "function returning t" ,and.I "array of t's"generated from.I t .Thus, an arbitrary type in C consists of a basic type, and zero or more of these operators..PPIn the compiler, a type is represented by an unsigned integer; the rightmost four bits hold the basictype, and the remaining bits are divided into two-bit fields, containing0 (no operator), or one of the three operatorsdescribed above.The modifiers are read right to left in the word, starting with the two-bitfield adjacent to the basic type, until a field with 0 in it is reached.The macros PTR, FTN, and ARY represent the.I "pointer to" ,.I "function returning" ,and.I "array of"operators.The macro values are shifted so that they align with the first two-bitfield; thus PTR+INTrepresents the type for an integer pointer, and.DSARY + (PTR<<2) + (FTN<<4) + DOUBLE.DErepresents the type of an array of pointers to functions returning doubles..PPThe type words are ordinarily manipulated by macros.If.I tis a type word,.I BTYPE(t)gives the basic type..I ISPTR(t) ,.I ISARY(t) ,and.I ISFTN(t)ask if an object of this type is a pointer, array, or a function,respectively..I MODTYPE(t,b)sets the basic type of.I tto.I b ..I DECREF(t)gives the type resulting from removing the first operator from.I t .Thus, if.I tis a pointer to.I t' ,a function returning.I t' ,or an array of.I t' ,then.I DECREF(t)would equal.I t' ..I INCREF(t)gives the type representing a pointer to.I t .Finally, there are operators for dealing with the unsigned types..I ISUNSIGNED(t)returns true if.I tis one of the four basic unsigned types;in this case,.I DEUNSIGN(t)gives the associated `signed' type.Similarly,.I UNSIGNABLE(t)returns true if.I tis one of the four basic types that could become unsigned, and.I ENUNSIGN(t)returns the unsigned analogue of.I tin this case..PPThe other important global data structure is that of expression trees.The actual shapes of the nodes are given in.I mfile1and.I mfile2 .They are not the same in the two passes; the first pass nodes containdimension and size information, while the second passnodes contain register allocation information.Nevertheless, all nodes contain fields called.I op ,containing the node number,and.I type ,containing the type word.A function called.I talloc()returns a pointer to a new tree node.To free a node, its.I opfield need merely be set to FREE.The other fields in the node will remain intact at least until the next allocation..PPNodes representing binary operators contain fields,.I leftand.I right ,that contain pointers to the left and right descendants.Unary operator nodes have the.I leftfield, and a value field called.I rval .Leaf nodes, with no descendants, have two value fields:.I lvaland.I rval ..PPAt appropriate times, the function.I tcheck()can be called, to check that there are no busy nodes remaining.This is used as a compiler consistency check.The function.I tcopy(p)takes a pointer.I pthat points to an expression tree, and returns a pointerto a disjoint copy of the tree.The function.I walkf(p,f)performs a postorder walk of the tree pointed to by.I p ,and applies the function.I fto each node.The function.I fwalk(p,f,d)does a preorder walk of the tree pointed to by.I p .At each node, it calls a function.I f ,passing to it the node pointer, a value passed downfrom its ancestor, and two pointers to values to be passeddown to the left and right descendants (if any).The value.I dis the value passed down to the root..a.I Fwalkis used for a number of tree labeling and debugging activities..PPThe other major data structure, the symbol table, exists only in pass one,and will be discussed later..SHPass One.PPThe first pass does lexical analysis, parsing, symbol table maintenance,tree building, optimization, and a number of machine dependent things.This pass is largely machine independent, and the machine independentsections can be pretty successfully ignored.Thus, they will be only sketched here..SHLexical Analysis.PPThe lexical analyzer is a conceptually simple routine that reads the inputand returns the tokens of the C language as it encounters them:names, constants, operators, and keywords.The conceptual simplicity of this job is confounded a bit by severalother simple jobs that unfortunately must go on simultaneously.These include.IP \(buKeeping track of the current filename and line number, and occasionallysetting this information as the result of preprocessor control lines..IP \(buSkipping comments..IP \(buProperly dealing with octal, decimal, hex, floatingpoint, and character constants, as well as character strings..PPTo achieve speed, the program maintains several tablesthat are indexed into by character value, to tellthe lexical analyzer what to do next.To achieve portability, these tables must be initializedeach time the compiler is run, in order that thetable entries reflect the local character set values..SHParsing.PPAs mentioned above, the parser is generated by Yacc from the grammar on file.I cgram.y.The grammar is relatively readable, but contains some unusual featuresthat are worth comment..PPPerhaps the strangest feature of the grammar is the treatment ofdeclarations.The problem is to keep track of the basic typeand the storage class while interpreting the variousstars, brackets, and parentheses that may surround a given name.The entire declaration mechanism must be recursive,since declarations may appear within declarations ofstructures and unions, or even within a.B sizeofconstructioninside a dimension in another declaration!.PPThere are some difficulties in using a bottom-up parser,such as produced by Yacc, to handle constructions where a lotof left context information must be kept around.The problem is that the original PDP-11 compiler is top-down inimplementation, and some of the semantics of C reflect this.In a top-down parser, the input rules are restrictedsomewhat, but one can naturally associate temporarystorage with a rule at a very early stage in the recognitionof that rule.In a bottom-up parser, there is more freedom in thespecification of rules, but it is more difficult to know whatrule is being matched until the entire rule is seen.The parser described by.I cgram.cmakes effective use ofthe bottom-up parsing mechanism in some places (notably the treatmentof expressions), but struggles against therestrictions in others.The usual result is that it is necessary to run a stack of values``on the side'', independent of the Yacc value stack,in order to be able to store and access informationdeep within inner constructions, where the relationship of therules being recognized to the total picture is not yet clear..PPIn the case of declarations, the attribute information(type, etc.) for a declaration is carefullykept immediately to the left of the declarator(that part of the declaration involving the name).In this way, when it is time to declare the name, thename and the type information can be quickly brought together.The ``$0'' mechanism of Yacc is used to accomplish this.The result is not pretty, but it works.The storage class information changes more slowly,so it is kept in an external variable, and stacked ifnecessary.Some of the grammar could be considerably cleaned up by usingsome more recent features of Yacc, notably actions withinrules and the ability to return multiple values for actions..PPA stack is also used to keep track of the current locationto be branched to when a.B breakor.B continuestatement is processed..PPThis use of external stacks dates from the time when Yacc did not permitvalues to be structures.Some, or most, of this use of external stackscould be eliminated by redoing the grammar to use the mechanisms now provided.There are some areas, however, particularly the processing of structure, union,and enum declarations, functionprologs, and switch statement processing, when havingall the affected data together in an array speeds laterprocessing; in this case, use of external storage seems essential..PPThe.I cgram.yfile also contains some small functions used asutility functions in the parser.These include routines for saving case values and labels in processing switches,and stacking and popping values on the external stack described above..SHStorage Classes.PPC has a finite, but fairly extensive, number of storage classesavailable.One of the compiler design decisions was toprocess the storage class information totally in the first pass;by the second pass, this information must havebeen totally dealt with.This means that all of the storage allocation must take place in the firstpass, so that references to automatics andparameters can be turned into references to cells lying a certainnumber of bytes offset from certain machine registers.Much of this transformation is machine dependent, and stronglydepends on the storage class..PPThe classes include EXTERN (for externally declared, but not defined variables),EXTDEF (for external definitions), and similar distinctions forUSTATIC and STATIC, UFORTRAN and FORTRAN (for fortran functions) andULABEL and LABEL.The storage classes REGISTER and AUTO are obvious, asare STNAME, UNAME, and ENAME (for structure, union, and enumerationtags), and the associated MOS, MOU, and MOE (for the members).TYPEDEF is treated as a storage class as well.There are two special storage classes: PARAM and SNULL.SNULL is used to distinguish the case where no explicitstorage class has been given; before an entry is madein the symbol table the true storage class is discovered.Similarly, PARAM is used for the temporary entry in the symboltable made before the declaration of function parameters is completed..PPThe most complexity in the storage class process comes frombit fields.A separate storage class is kept for each width bit field;a.I kbit bit field has storage class.I kplus FIELD.This enables the size to be quickly recovered from the storage class..SHSymbol Table Maintenance..PPThe symbol table routines do far more than simply enternames into the symbol table;considerable semantic processing and checking is done as well.For example, if a new declaration comes in, it must be checkedto see if there is a previous declaration of the same symbol.If there is, there are many cases.The declarations may agree and be compatible (for example,an extern declaration can appear twice) in which case thenew declaration is ignored.The new declaration may add information (such as an explicit arraydimension) to an already present declaration.The new declaration may be different, but still correct (for example,an extern declaration of something may be entered,and then later the definition may be seen).The new declaration may be incompatible, but appear in an inner block;in this case, the old declaration is carefully hidden away, andthe new one comes into force until the block is left.Finally, the declarations may be incompatible, and an errormessage must be produced..PPA number of other factors make for additional complexity.The type declared by the user is not always the typeentered into the symbol table (for example, if an formal parameter to a functionis declared to be an array, C requires that this be changed intoa pointer before entry in the symbol table).Moreover, there are various kinds of illegal types thatmay be declared which are difficult tocheck for syntactically (for example,a function returning an array).Finally, there is a strange feature in C that requiresstructure tag names and member names for structures and unionsto be taken from a different logical symbol table than ordinary identifiers.Keeping track of which kind of name is involved is a bit of struggle(consider typedef names used within structure declarations, for example)..PPThe symbol table handling routines have been rewritten a number of times toextend features, improve performance, and fix bugs.They address the above problems with reasonable effectiveness
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -