📄 chapter 4 the parserw.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if (token == lparentsy) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleExpression();

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(rparentsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>The last thing that we need is a function 
<TT>main()</TT> to initialize the scanner and start the parser. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void main()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //*** Perform any initialization code here, if needed...
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //*** Parse!
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // Prime the scanner
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if(FIRST_Statements) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleStatements();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // Call the root rule
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(eopsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // make sure we hit the end of the file
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf(stderr, "Statements not found\n");
&nbsp;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //*** Do any cleanup work that needs to be done...
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE><TT>main()</TT> accomplishes its task in 
three easy steps. 
<OL>
  <LI>Before calling the first rule, we invoke <TT>NextToken()</TT> once. This 
  primes the variable <TT>token</TT>, so that our starting rule can begin its 
  work immediately. <BR>&nbsp; 
  <P> </P>
  <LI>Notice that before we call <TT>RuleStatements()</TT> we check to see if 
  the current token matches <TT>FIRST_Statements</TT>. This step is not 
  necessary for every grammar, especially when the rule checks to see if it is 
  starting on the right token. However, we have a special case with this 
  grammar. Remember that <TT>RuleStatements()</TT> just fell through if the 
  current token didn't match <TT>FIRST_IfStatement</TT> or 
  <TT>FIRST_AssignmentStatement</TT>. In this case, we need to check. Once we 
  have verified that <TT>token</TT> is in our set of FIRSTs, we begin parsing 
  the token stream by calling <TT>RuleStatements()</TT>. <BR>&nbsp; 
  <P> </P>
  <LI>Once <TT>RuleStatements()</TT> returns to <TT>main()</TT> we need to 
  verify that we have actually reached the end of the program. This is 
  accomplished by testing <TT>token</TT> to see if it equals <TT>eopsy</TT>. We 
  do this by making one last call to <TT>Accept()</TT>. </LI></OL>This completes 
our simple example. 
<H2>4.5 Error-Checking</H2><!-------------------------------------------------------------------------------->This 
parser works fine for perfect programmers who never make mistakes. However, due 
to human error, it is inevatible that a programmer will forget a semicolon, or 
type something like <TT>end if</TT> instead of <TT>endif</TT>, or make some 
other typing error. Our parser only handles one error at a time, and it doesn't 
even give very descriptive error messages, at that. 
<P>&nbsp;Ideally, we do not want the parser to run until it detects only one 
error and then stop. Usually there is more than one error in any program, 
especially if it is being compiled for the first time. The trick however, is for 
the parser and the stream of tokens to remain properly synchronized. Error 
checking within a recursive descent parser is handled by use of two sets in 
addition to FIRST, known as CURRENT and FOLLOW. <BR>&nbsp; <BR>&nbsp; 
<H3>4.5.1 CURRENT and FOLLOW</H3><!-------------------------------------------------------------------------------->Of 
all the sets, CURRENT is the easiest to maintain. For any given rule <I>S</I>, 
CURRENT-<I>S</I> is the set of all tokens that <I>S</I> expects to encounter but 
has not yet scanned (in other words, all tokens that happen later on in the 
rule). 
<P>Formally, the initial CURRENT for any given rule <I>S</I> contains all 
terminals accepted by <I>S</I> as well as the set of all FIRSTs for all other 
rules invoked by <I>S</I>. For example, take a look at rule I (IfStatement) in 
figure {SYNSTA}. The initial elements of the set CURRENT-IfStatement are <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements</PRE>As 
<TT>RuleIfStatement()</TT> accepts tokens or calls other rules, these elements 
will be deleted. For example. The first thing rule <TT>RuleIfStatement()</TT> 
does is accept an <TT>ifsy</TT>. Before accepting this token, it removes it from 
its CURRENT. The rule then calls <TT>RuleExpression()</TT>. Before calling it, 
<TT>RuleIfStatement()</TT> removes FIRST-Expression from its current. It then 
accepts a <TT>thensy</TT> removing it beforehand from current, and so on. 
Pseudocode for the function will look something like this: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** If **********************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleIfStatement()
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // Initialize current here.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(ifsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize an if
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {thensy, endifsy} <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleExpression();&nbsp;&nbsp;&nbsp;&nbsp; // process an expression

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {endifsy} <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(thensy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a then

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {endifsy}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleStatements();&nbsp;&nbsp;&nbsp;&nbsp; // process a series of statements

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(endifsy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process an endif
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>The method of removing the token prior to 
accepting is important for later on. The method of repeatedly rebuilding current 
in this example is somewhat redundant. This was done on purpose. It certainly 
would be easier to remove the tokens as they were accepted, instead of 
rebuilding the set. Later, we will talk about a better way of maintaining 
current. 
<P>The set FOLLOW is constructed in a different way. The formal definition of 
FOLLOW for any rule <I>S</I> is the set of all terminals that can follow 
<I>S</I>. In the case of this compiler we will not try to compute the entire set 
of FOLLOW. Though it can be done, it is computationally expensive. We will use a 
shortcut. 
<P>&nbsp;A subset of FOLLOW-<I>S</I> can be computed at runtime by having each 
rule calling <I>S</I> pass in its set of CURRENT. FOLLOW is properly maintained 
by enforcing three rules: 
<OL>
  <LI>Each rule <I>S</I> recieves FOLLOW from the rule that called it. FOLLOW is 
  passed in to rule <I>S</I> <I>by value</I>. 
  <LI>Each rule <I>S</I> maintains (in some form) its own CURRENT, as previously 
  described. 
  <LI>When rule <I>S</I> calls some other rule <I>A</I>, it will pass in the the 
  set of FOLLOW that it received from its calling rule, unioned with CURRENT. 
  </LI></OL>Thus, FOLLOW for any rule <I>S</I> is a union of all previous and 
properly maintained CURRENTs. To continue our previous example, pseudocode for 
<TT>RuleIfStatement()</TT> will look something like this: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /*** If **********************/
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void RuleIfStatement(Follow)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize an if
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(ifsy);

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process an expression
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {thensy, endifsy} <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleExpression(Follow <B>union</B> Current);

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // recognize a then
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {endifsy} <B>union</B> FIRST-Statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(thensy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process a series of statements
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {endifsy}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RuleStatements(Follow <B>union</B> Current);

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // process an endif
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current = {}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(endifsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>Before we begin parsing, we initialize a 
set of CURRENT to contain only the end-of-file token, and pass it into our start 
rule. Our start rule takes this set as FOLLOW, and constructs its own CURRENT. 
When calling other rules it passes in CURRENT+FOLLOW. Thus, with each successive 
invocation of a rule the set of FOLLOW grows to include all of the elements 
belonging to previous sets of CURRENT. 
<P>&nbsp;<B>4.5.2 Accepting a Token</B> <BR><!-------------------------------------------------------------------------------->The 
possibility exists, each time a token is consumed, that the token desired will 
not be the same as the current token from the input stream. When this happens, 
the parser should signal an error and then attempt to recover. In order to 
properly recover, the parser makes use of both CURRENT and FOLLOW. 
<P>&nbsp;The process of accepting a token is simple: compare the desired token 
to the actual token from the token stream. If there is a match, then discard it 
and get the next token.Otherwise, continue eating tokens until the current token 
is an element of either CURRENT or FOLLOW. The only caveat for this procedure is 
that the desired token should be removed from CURRENT beforehand. In pseudocode, 
the algorithm would look like this: <BR>&nbsp; <BR>&nbsp; <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CURRENT -= desiredtoken;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // Remove desired token from CURRENT

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token == desiredtoken)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // If token and desiredtoken match, accept!
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // Otherwise print an error and eat tokens
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print errmessage;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SkipTo(CURRENT <B>union</B> FOLLOW);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>We can put this into a function called 
<TT>Accept()</TT>, which looks something like this: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void Accept(Symbol sy, Set &amp;SyncSet, const char *_msg)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //*** Token is not correct
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (token.sy != sy) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // print error message

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SkipTo(SyncSet);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // SkipTo a token that can follow
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //*** Token is correct
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NextToken();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</PRE>The first parameter to <TT>Accept()</TT> 
is the desired token. If it matches the current token from the scanner, then the 
next token is retrieved. If there is no match, then the next step is to print 
the error message and then repeatedly scan for tokens until there is a match to 
the set, <TT>SyncSet</TT>. Using <TT>Accept()</TT> is a simple matter. The 
current rule maintains CURRENT, composed of all tokens that it accepts, and the 
FIRSTS for all rules that it may call. Prior to accepting a token, it is removed 
from CURRENT, and then the function <TT>Accept()</TT> is called. The set, 
<TT>SyncSet</TT> from the previous snippet of pseudocode is composed of the 
union of CURRENT and FOLLOW. <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CURRENT -= desiredtoken;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(desiredtoken, CURRENT+FOLLOW, "Expected 'desiredtoken'");</PRE>This 
section and the previous section represent the two fundamental concepts for 
error checking and recovery in recursive descent parsing. In order to implement
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -