📄 chapter 4 the parserw.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:
      <TD>if</TD></TR>
    <TR vAlign=top>
      <TD><TT>thensy</TT></TD>
      <TD>then</TD></TR>
    <TR vAlign=top>
      <TD><TT>endifsy</TT></TD>
      <TD>endif</TD></TR>
    <TR vAlign=top>
      <TD><TT>assignsy</TT></TD>
      <TD>:=</TD></TR>
    <TR vAlign=top>
      <TD><TT>timessy</TT></TD>
      <TD>*</TD></TR>
    <TR vAlign=top>
      <TD><TT>idivsy</TT></TD>
      <TD>/</TD></TR>
    <TR vAlign=top>
      <TD><TT>plussy</TT></TD>
      <TD>+</TD></TR>
    <TR vAlign=top>
      <TD><TT>minussy</TT></TD>
      <TD>-</TD></TR>
    <TR vAlign=top>
      <TD><TT>lparentsy</TT></TD>
      <TD>(</TD></TR>
    <TR vAlign=top>
      <TD><TT>rparentsy</TT></TD>
      <TD>)</TD></TR>
    <TR vAlign=top>
      <TD><TT>semicolonsy</TT></TD>
      <TD>;</TD></TR>
    <TR vAlign=top>
      <TD><TT>eopsy</TT></TD>
      <TD>End of program (Scanner detected an end of 
file)</TD></TR></TBODY></TABLE></MENU>In a compiler, each token is treated as if 
it were a terminal within a set of grammar productions. Recall that we use a 
series of FSAs to break up an arbitrary text strig into a stream of descrete 
tokens of some form like the one listed above. For the purposes of this example, 
tokens will not contain any additional data. 
<P>&nbsp;We also make use of two essential tokens. The first is <TT>badsy</TT>. 
This token may appear for a variety of reasons. Mostly, this token will appear 
because of an illegal character. It may also appear because of an unrecognizable 
token. 
<P>&nbsp;The other extra token, <TT>eopsy</TT> stands for "end of program". 
Essentially this means that the scanner has detected the end of the input 
stream, i.e., the end of the input file. 
<P>&nbsp;<B>4.4.2.2 Parser Rules</B> <BR><!-------------------------------------------------------------------------------->When 
coding a parser, we create a seperate procedure for each syntax diagram. Each 
procedure is called a rule. By convention within this text, each rule will be 
named like so: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rule<I>S</I>()</PRE>where <I>S</I> is the 
name of the syntax diagram for that rule. Coding a rule is a very simple matter. 
Each branch in the path of a syntax diagram is an <TT>if</TT> statement. Each 
loop is a <TT>while</TT>. At each decision point, we ask if <TT>token</TT> is 
equal to some <I>x</I>sy. 
<P>&nbsp;There are some things to remember about tokens, however. 
<OL>
  <LI>In addition, each rule will start out with the variable <TT>token</TT> 
  pre-primed. In other words, no rule should have to start out by calling 
  <TT>NextToken()</TT> to get rid of tokens of a previous rule. <I>This is 
  probably the most important rule of all.</I> 
  <LI>When a rule has determined by analysis of the current token and 
  FIRST-<I>S</I> that <TT>Rule<I>S</I>()</TT> ought to be called, it should 
  immediately call <TT>Rule<I>S</I>()</TT>. It should not eat the token. It 
  should let <TT>Rule<I>S</I>()</TT> or any derivation therof perform that task. 

  <LI>Whenever a rule recognizes a terminal, or token that is a part of its own 
  syntax it <I>must</I> call <TT>NextToken()</TT>. </LI></OL>This strict 
discipline will assure us that our code does not lose track of tokens. Another 
thing that we need is all the FIRSTs. These are usually implemented as a series 
of pre-initialized sets. For the purposes of this example, the FIRSTs will be 
implemented using macros that return a boolean result. The firsts are listed in 
listing {FIRSTEX}. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_Factor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (token == idconsy || token == lparentsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_Term&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (token == idconsy || token == lparentsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_Expression&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (token == idconsy || token == lparentsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_AssignmentStatement (token == idconsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_IfStatement&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (token == ifsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define FIRST_Statements&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (token == idconsy || token == ifsy)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <B>Listing {FIRSTEX}.</B>&nbsp; The list of all firsts for our simple example.</PRE>In 
a later section, we will use a different implementation for our set of FIRSTs. 
The set we have here is designed to be used in the condition for an if statement 
in C. We do this like so: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (FIRST_IfStatement)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleIfStatement();</PRE>Next, 
we need some error routine. We will discuss error handling techniques in a later 
section. For now, we will use a function that tests the current token against 
what the current rule is expecting. If there is not a match it prints a simple 
message to <TT>stderr</TT> and then terminates the program. We will call this 
function <TT>Accept()</TT>. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(enum _Token t)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token != t) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fprintf(stderr, "Syntax error");
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; exit(1);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <B>Listing {EXVALID}.</B></PRE>The last things 
that we will need are the syntax diagrams for rules AssignmentStatement, 
IfStatement, and Statements. See figure {SYNSTA}. 
<CENTER><IMG src="Chapter 4 The Parserw.files/SYNSTA.gif"></CENTER><B>4.4.2.3 
Coding the Syntax Diagrams</B> <BR><!-------------------------------------------------------------------------------->Let 
us begin with the root rule, Statements. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** Statement ***************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleStatements()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while (FIRST_Statements) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (FIRST_AssignmentStatement) {&nbsp;&nbsp;&nbsp; // process an assignment
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleAssignmentStatement();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if (FIRST_IfStatement) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process an if-statement
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleIfStatement();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>By convention, this rule assumes that 
<TT>token</TT> is pre-initialized to one of the members of either 
<TT>FIRST_AssignmentStatement</TT> or <TT>FIRST-IfStatement</TT>. Notice that if 
the token is anything else that it will do a pass-through. This mimics the 
behavior of the production, 
<MENU>S -&gt; <I>(nothing)</I></MENU>Rules AssignmentStatement and IfStatement 
are equally easy. These are more interesting since they actually recognize 
tokens as well as call other rules. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** Assignment **************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleAssignmentStatement()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(idconsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize an identifier/constant
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(assignsy);&nbsp;&nbsp;&nbsp;&nbsp; // recognize a :=

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleExpression();&nbsp;&nbsp;&nbsp;&nbsp; // process an expression

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(semicolonsy);&nbsp; // recognize a ;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** If **********************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleIfStatement()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(ifsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize an if

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleExpression();&nbsp;&nbsp;&nbsp;&nbsp; // process an expression

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(thensy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a then

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleStatements();&nbsp;&nbsp;&nbsp;&nbsp; // process a series of statements

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(endifsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process an endif
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>These rules are quite straightforward. 
They verify the correct tokens, and call the rules that need to be called. By 
this time, a pattern of coding should be evident. If you compare these functions 
to the syntax diagrams that they represent, you will see. Let's move on. 
Remember that the syntax diagrams for these next rules are at the start of 
section 4.3. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** Term ********************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleTerm()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while(FIRST_Factor) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // is there a factor?
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleFactor()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token == timessy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a *
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if (token == idivsy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a /
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // otherwise exit the loop
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** Expression **************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleExpression()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; while(FIRST_Term) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // is there a term?
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleTerm()

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token == plussy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if (token == minussy)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a -
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // otherwise exit the loop
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>These two rules are very similar. Since 
they must employ a looping strategy, they make use of the FIRSTs for the rules 
that they loop on. Once <TT>RuleTerm()</TT> or <TT>RuleFactor()</TT> returns, 
they verify that the current token is an appropriate operator. If no operator is 
present, they return. Notice that if there is no match that these two rules do 
not bother processing errors. The strategy is for each rule to parse only what 
it knows, and leave the rest to whatever rule called it. 
<P>&nbsp;There is another set called FOLLOW, which is the set of all tokens that 
may follow a rule. The current token could be tested against this set in order 
to make sure that whatever token happens to follow when the current rule returns 
will be valid for the calling rule. Later on when we take a look at error 
detection and handling, we will see a method for computing FOLLOW at runtime. 
<P>&nbsp;The final rule for Factor is pretty straightforward. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** Factor ******************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleFactor()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token == idconsy)
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -