📄 guide.tex
字号:
will print just the names of the checked-out files.You may want to create aliases for these, e.g.\ {\tt llrw} and{\tt lsrw}.I frequently type {\tt rcsdiff `lsrw`} to find out what changes I'vemade recently, and {\tt ci -u `lsrw`} to check them all in.\end{description}\section{Creating and Running Your Copy of the Compiler}The root directory of the PL/0 distributioncontains a copy of the source for the PL/0compiler, together with a {\tt Makefile}, a {\tt README} file, and a{\tt NEWUSER} script.The {\tt README} file explains how to use the {\tt NEWUSER} script.This script will copy the source, create an RCS archive of it, run {\tt make}to build the compiler, and create a test directory.The {\tt Makefile} places the final PL/0 compiler in a file named{\tt pl0} .The compiler reads a PL/0 source program from standard input or froma file named in its command-line arguments.In the latter case, the name must end with ``{\tt .0}''.The compiler printserror messages and diagnostics to standard output (fatal errors tostandard error), and writes MIPS~I assembler target codeto a file whose name is created by replacing the``{\tt .0}'' in the source file name with ``{\tt .s}''or, if compiling standard input, to a file named {\tt plzero.s} .With a {\tt -sgi} command-line argument, the compiler generates the MIPSdialect understood by assemblers for Silicon Graphics computers.By default, or with a {\tt -spim} argument, it generates the dialectunderstood by James Larus's MIPS interpreter, {\tt spim}, from theUniversity of Wisconsin.To compile and run a PL/0 program under {\tt spim}, type:\begin{verbatim} pl0 foo.0 spim -file foo.s\end{verbatim}The pl0 compiler accepts a few additional arguments:\begin{itemize}\item {\tt -g}, Include partial symbol table information in the target code, to make it a little easier to debug PL/0 programs.\item {\tt -C}, Include PL/0 source lines in the target code as comments.\item {\tt -O+c}, Evaluate most constant expressions at compile time.\item {\tt -O-p}, Do not do peephole optimization. At present, the only optimization performed is to avoid moving the contents of a register into itself.\end{itemize}\section{PL/0 Compilation Phases}The PL/0 compiler consists of just under 11,000 lines ofheavily-commented C++, plus automatically-generated scanner and parsertables. Constants in the code are printed entirely in capital letters,with sub-words separated by underscores, e.g.\ {\tt SCRATCH\_REG}. Allother identifiers are printed entirely in lower case, with sub-wordsseparated by underscores. User-defined type and class names end with{\tt \_t}, e.g. {\tt token\_attrib\_t}.\subsection{Source Buffering and Diagnostic Messages}Files {\tt inpbuf.[h,cc]}The compiler reads its source on demand, and keeps it all buffered in a linkedlist. Together with each line, it keeps a list of diagnostic messagesassociated with that line.Buffering allows messages to be generated out of order, but printed in order.The later phases of the compiler keep track, for each syntacticconstruct, of the location (line and column) at which that construct began.Messages pertaining to that construct can then be issued at the appropriatelocation.Each diagnostic has a key letter in column 2 that indicates what kind of amessage it is.\begin{itemize}\item {\tt S} means syntax repair.\item {\tt W} means warning.\item {\tt E} means semantic error.\end{itemize}Semantic errors inhibit code generation; syntax repairs and warningsdo not.\subsection{Scanner}Files {\tt scantab.[h,cc]}, {\tt scanner.[h,cc]}The scanner is a straightforward finite automaton.It keeps a copy of tokens whose characters are significant (identifiersand numbers).Its main entry point is the function {\tt token\_get}, called by the parser.The only tricky part of the scanner is the mechanism for backtracking,employed when one token in the language is a prefix of another,but more than one character of lookahead is required todistinguish between the two.(For example, consider {\tt 3.1459} and {\tt 3..14} in Pascal, or{\tt DO5I=1,10} and {\tt DO5I=1.10} in Fortran.)No such tokens occur in PL/0, but they could easily be added, and thescanner will support them.\subsection{Parser and Syntax Error Repair Routines}Files {\tt parsetab.[h,cc]}, {\tt parser.[h,cc]}The parser is a straightforward LL(1) pushdown automaton, augmented witherror repair routines and with markers to support automaticmanagement of space for attributes (see below).The error repair routines employa locally least cost technique due to Fischer, Milton, and Quiring.Basically, they calculate the ``cheapest''set of insertions and/or deletions that will allow the parser to consume onemore token of real input.Insertion and deletion costs are specified in the {\tt grammar} file.The choice among alternative repairs can be tuned by changing therelative costs of various insertions and deletions.Generally speaking, it should be easy to insert or delete tokens thata programmer is likely to forget or to type by mistake, and hard toinsert or delete tokens that require matching constructs later in the program,since incorrect repairs of this type are guaranteed to lead to future errors.\subsection{Syntax Tree Construction}Files {\tt attributes.[h,cc]}The semantic action routines in the grammar file do one thing only:construct an abstract syntax tree.They perform no symbol table manipulations, nor do they check for anyerror conditions.Space for attributes is maintained in a so-called``attribute stack'',which is pushed and popped automatically at appropriate times in theparse.At any particular point in time, the attribute stack contains avertical slice of the parse tree: one record for each symbol in eachproduction between the current node and the root.The full parse tree is never in memory at once.When the parser predicts a production, it calls into the attributemanager to push records onto the attribute stack for all the symbols onthe right-hand side (RHS).It also pushes a marker onto the parse stack.The marker allows it to tell when it has popped all the RHS symbols offthe parse stack, at which point it calls into the attribute manager topop the RHS symbols off the attribute stack.Each element of the attribute stack contains either the syntheticattributes of a token from the scanner or a pointer to a syntax treenode (or, often, nothing at all).The purpose of the action routines in the grammar is to pull the tokeninformation out of the attribute stack, construct appropriatepieces of the syntax tree, and hang them off of records in theattribute stack that will be available to routines later in the parse(higher in the parse tree).When parsing is completed, the root of the complete syntax tree isfound in the attribute record of the goal symbol, the last remainingrecord on the attribute stack.\subsection{Semantic Checks}\label{sec-semantic-checks}Files {\tt semantics.[h,cc]}The semantic checker enters identifiers into the (scoped) symbol table, and performs all static semantic checks.For PL/0, the only checks are the following:\begin{itemize}\item is every identifier declared before use?\item is no identifier declared more than once in the same scope?\item is every left-hand side of an assignment a variable?\item is every identifier in an expression a variable or constant?\item is every called identifier a subroutine?\end{itemize}In addition, the PL/0 compiler performs the following ``sanity checks'',and issues warnings (not error messages) if they fail:\begin{itemize}\item is every variable both written and read?\item is every constant read?\item is every subroutine called?\end{itemize}\subsection{Code Improvement (Optimization)}Files {\tt optimize.[h,cc]}The purpose of the code improver is to make machine-independent modificationsto the abstract syntax tree that are likely to result in code that issignificantly faster and/or smaller.The main entry point is {\tt optimize\_syntax\_tree}.At present it doesn't do much: it computes the values of arithmeticexpressions and conditions consisting entirely of constants.It does not use arithmetic identities to re-order expressions in order touncover additional constants, nor does it perform copy propagation.More to the point, it does not perform any of the other standardmachine-independent optimizations, such as identification and exploitationof loop invariants, induction variables, common subexpressions, live anddead variables, or available and very busy expressions.\subsection{Code Generation}Files {\tt codegen.[h,cc]}, {\tt mips.[h,cc]}There are two sub-phases to code generation.The first assigns frame pointer offsets for local variables, and allocatesregisters; the second generates code.Register allocation is extremely naive: the registers of the targetmachine not reserved for any other purpose are used as a LIFO evaluationstack.Most programs use nowhere near the full register set.Any program requiring more than the available number of registers producesa fatal compiler error.A good compiler (with a code improver and a smart register allocator) wouldkeep many variables in registers much of the time, spilling them to memoryonly when absolutely necessary.For the purposes of code generation, additional fields are required inboth the nodes of the syntax tree and the records of the symbol table.These fields are maintained in separate records so that they neednot be visible to the early phases.The main entry point for code generation is {\tt generate\_code}, in {\tt codegen.cc}. The routines in {\tt mips.[h,cc]} encapsulate the MIPS assembler syntax by overloading the function {\tt mips\_op}.\subsection{Symbol Table}Files {\tt symtab.[h,cc]}The symbol table has a wide interface (many subroutines declared in symtab.h),but a simple internal structure.There is a single main hash table, keyed by \{name, scope\} pairs,and a stack of open scopes.To look up an element, the symbol table routines traverse the open scope stackfrom top to bottom, probing the hash table at each level to see if the desiredname was declared in that scope.Faster schemes are possible, but this one is simple, works well, and isnot outlandishly slow.\section{Compiler Classes}\subsection{Syntax Tree Nodes}Every node in the attributed syntax tree is an object belonging to one ofthe concrete subtypes of the abstract type {\tt syntax\_tree\_t}.Figure~\ref{fig-syntax} shows the hierarchy of PL/0 syntax tree classes.\begin{figure}\centerline{\epsfxsize\textwidth\epsfbox{syntax.eps}}\caption{Syntax tree classes.}\label{fig-syntax}\end{figure}Each subclass of {\tt syntax\_tree\_t} must implement the followingmember functions:\begin{itemize}\item constructor (in {\tt attributes.cc})\item {\tt check\_semantics} (in {\tt semantics.cc}) \\ Checks the semantics of the node and any of its children.\item {\tt optimize} (in {\tt optimize.cc}) \\ Performs any code optimizations on a node and its children.\item {\tt allocate\_registers} (in {\tt codegen.cc}) \\ Allocates any registers used by the node and lets its children also allocate their needed registers. Returns a pointer to its {\tt syntax\_code\_t} structure which details register usage and where the node's parent can locate any results generated by the node.\item {\tt generate\_code} (in {\tt codegen.cc}) \\ Generates any inline assembly code need by the node and its children and dumps code to the output file using the {\tt mips\_op} functions.\item {\tt generate\_lcode} (in {\tt codegen.cc}) \\ Generates any inline code needed to access a node's lvalue.\item {\tt dump\_object\_data} (in {\tt attributes.cc}) \\ Dumps all useful debugging information about a syntax tree node to {\tt stdout}.\item destructor (in {\tt attributes.cc}) \\ Deallocates the node and its children.\end{itemize}\subsection{Symbol Table Entries}All entries in the symbol table must be members of a concrete type whichis derived from the abstract type {\tt symbol\_entry\_t}.Figure~\ref{fig-symbol} shows the PL/0 symbol table entry classes.\begin{figure}\centerline{%\epsfysize 2in\epsfbox{symbol.eps}}\caption{Symbol table classes.}\label{fig-symbol}\end{figure}Each subclass of {\tt symbol\_entry\_t} must implement the followingmember functions:\begin{itemize}\item constructor (in {\tt symtab.cc})\item {\tt allocate\_space} (in {\tt codegen.cc}) \\ Allocates any space either in the stack frame or globally which will be needed to represent the symbol in the compiled program.\item {\tt generate\_data} (in {\tt codegen.cc}) \\ Outputs any pseudo-symbols or other code for creating the symbol in the output assembly file. For example, global variables output the pseudo-opcode ``{\tt .dword 0}''.\item {\tt generate\_code} (in {\tt codegen.cc}) \\ Generates any inline code needed to access a symbol.\item {\tt dump\_object\_data} (in {\tt symtab.cc}) \\ Dumps all useful debugging information about a symbol table entry to {\tt stdout}.\end{itemize}No destructor is needed for {\tt symbol\_entry\_t}; once created, symboltable entries remain allocated through the end of the compiler run.\section*{Acknowledgments}Michael Scott wrote the original version of the PL/0 compiler (and the {\ttmakeparse} and {\tt makescan} tools) in Pascal in the summer of 1992. Heborrowed some of the code from his compiler for the Lynx programminglanguage, which in turn contained pieces of his undergraduate compilercourse project from the University of Wisconsin. He based the generalstructure of the parser and error repair routines on code written by JonMauney to accompany thedistribution of {\tt fmq}. In the fall of 1993 Micha\l\ Cierniak replacedthe original code generator (which produced a subset of C) with code togenerate MIPS assembly language. Micha\l\ also introduced a simple codeimprovement phase. Galen Hunt re-wrote the compiler in C++ in the fall of1994, introducing the {\tt symbol\_entry\_t} and {\tt syntax\_tree\_t}class hierarchies, substantially simplifying code generation via overloadedfunctions, and introducing a variety of other improvements. Michael Scottcreated the {\tt mungegrammar} tool in the summer of 1995, replacing anearlier pre-processor (called {\tt prefmq}) that did not produce {\tttokens.h} or {\tt scangen.in}.\bibliography{pragmatics}\bibliographystyle{plain}\end{document}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -