📄 chapter 4 the parserw.htm
字号:
<TD>if</TD></TR>
<TR vAlign=top>
<TD><TT>thensy</TT></TD>
<TD>then</TD></TR>
<TR vAlign=top>
<TD><TT>endifsy</TT></TD>
<TD>endif</TD></TR>
<TR vAlign=top>
<TD><TT>assignsy</TT></TD>
<TD>:=</TD></TR>
<TR vAlign=top>
<TD><TT>timessy</TT></TD>
<TD>*</TD></TR>
<TR vAlign=top>
<TD><TT>idivsy</TT></TD>
<TD>/</TD></TR>
<TR vAlign=top>
<TD><TT>plussy</TT></TD>
<TD>+</TD></TR>
<TR vAlign=top>
<TD><TT>minussy</TT></TD>
<TD>-</TD></TR>
<TR vAlign=top>
<TD><TT>lparentsy</TT></TD>
<TD>(</TD></TR>
<TR vAlign=top>
<TD><TT>rparentsy</TT></TD>
<TD>)</TD></TR>
<TR vAlign=top>
<TD><TT>semicolonsy</TT></TD>
<TD>;</TD></TR>
<TR vAlign=top>
<TD><TT>eopsy</TT></TD>
<TD>End of program (Scanner detected an end of
file)</TD></TR></TBODY></TABLE></MENU>In a compiler, each token is treated as if
it were a terminal within a set of grammar productions. Recall that we use a
series of FSAs to break up an arbitrary text strig into a stream of descrete
tokens of some form like the one listed above. For the purposes of this example,
tokens will not contain any additional data.
<P> We also make use of two essential tokens. The first is <TT>badsy</TT>.
This token may appear for a variety of reasons. Mostly, this token will appear
because of an illegal character. It may also appear because of an unrecognizable
token.
<P> The other extra token, <TT>eopsy</TT> stands for "end of program".
Essentially this means that the scanner has detected the end of the input
stream, i.e., the end of the input file.
<P> <B>4.4.2.2 Parser Rules</B> <BR><!-------------------------------------------------------------------------------->When
coding a parser, we create a seperate procedure for each syntax diagram. Each
procedure is called a rule. By convention within this text, each rule will be
named like so: <PRE> Rule<I>S</I>()</PRE>where <I>S</I> is the
name of the syntax diagram for that rule. Coding a rule is a very simple matter.
Each branch in the path of a syntax diagram is an <TT>if</TT> statement. Each
loop is a <TT>while</TT>. At each decision point, we ask if <TT>token</TT> is
equal to some <I>x</I>sy.
<P> There are some things to remember about tokens, however.
<OL>
<LI>In addition, each rule will start out with the variable <TT>token</TT>
pre-primed. In other words, no rule should have to start out by calling
<TT>NextToken()</TT> to get rid of tokens of a previous rule. <I>This is
probably the most important rule of all.</I>
<LI>When a rule has determined by analysis of the current token and
FIRST-<I>S</I> that <TT>Rule<I>S</I>()</TT> ought to be called, it should
immediately call <TT>Rule<I>S</I>()</TT>. It should not eat the token. It
should let <TT>Rule<I>S</I>()</TT> or any derivation therof perform that task.
<LI>Whenever a rule recognizes a terminal, or token that is a part of its own
syntax it <I>must</I> call <TT>NextToken()</TT>. </LI></OL>This strict
discipline will assure us that our code does not lose track of tokens. Another
thing that we need is all the FIRSTs. These are usually implemented as a series
of pre-initialized sets. For the purposes of this example, the FIRSTs will be
implemented using macros that return a boolean result. The firsts are listed in
listing {FIRSTEX}. <PRE> #define FIRST_Factor (token == idconsy || token == lparentsy)
#define FIRST_Term (token == idconsy || token == lparentsy)
#define FIRST_Expression (token == idconsy || token == lparentsy)
#define FIRST_AssignmentStatement (token == idconsy)
#define FIRST_IfStatement (token == ifsy)
#define FIRST_Statements (token == idconsy || token == ifsy)
<B>Listing {FIRSTEX}.</B> The list of all firsts for our simple example.</PRE>In
a later section, we will use a different implementation for our set of FIRSTs.
The set we have here is designed to be used in the condition for an if statement
in C. We do this like so: <PRE> if (FIRST_IfStatement)
RuleIfStatement();</PRE>Next,
we need some error routine. We will discuss error handling techniques in a later
section. For now, we will use a function that tests the current token against
what the current rule is expecting. If there is not a match it prints a simple
message to <TT>stderr</TT> and then terminates the program. We will call this
function <TT>Accept()</TT>. <PRE> Accept(enum _Token t)
{
if (token != t) {
fprintf(stderr, "Syntax error");
exit(1);
}
else
NextToken();
}
<B>Listing {EXVALID}.</B></PRE>The last things
that we will need are the syntax diagrams for rules AssignmentStatement,
IfStatement, and Statements. See figure {SYNSTA}.
<CENTER><IMG src="Chapter 4 The Parserw.files/SYNSTA.gif"></CENTER><B>4.4.2.3
Coding the Syntax Diagrams</B> <BR><!-------------------------------------------------------------------------------->Let
us begin with the root rule, Statements. <PRE> /*** Statement ***************/
void RuleStatements()
{
while (FIRST_Statements) {
if (FIRST_AssignmentStatement) { // process an assignment
RuleAssignmentStatement();
}
else if (FIRST_IfStatement) { // process an if-statement
RuleIfStatement();
}
}
}</PRE>By convention, this rule assumes that
<TT>token</TT> is pre-initialized to one of the members of either
<TT>FIRST_AssignmentStatement</TT> or <TT>FIRST-IfStatement</TT>. Notice that if
the token is anything else that it will do a pass-through. This mimics the
behavior of the production,
<MENU>S -> <I>(nothing)</I></MENU>Rules AssignmentStatement and IfStatement
are equally easy. These are more interesting since they actually recognize
tokens as well as call other rules. <PRE> /*** Assignment **************/
void RuleAssignmentStatement()
{
Accept(idconsy); // recognize an identifier/constant
Accept(assignsy); // recognize a :=
RuleExpression(); // process an expression
Accept(semicolonsy); // recognize a ;
}
/*** If **********************/
void RuleIfStatement()
{
Accept(ifsy); // recognize an if
RuleExpression(); // process an expression
Accept(thensy); // recognize a then
RuleStatements(); // process a series of statements
Accept(endifsy); // process an endif
}</PRE>These rules are quite straightforward.
They verify the correct tokens, and call the rules that need to be called. By
this time, a pattern of coding should be evident. If you compare these functions
to the syntax diagrams that they represent, you will see. Let's move on.
Remember that the syntax diagrams for these next rules are at the start of
section 4.3. <PRE> /*** Term ********************/
void RuleTerm()
{
while(FIRST_Factor) { // is there a factor?
RuleFactor()
if (token == timessy)
NextToken(); // recognize a *
else if (token == idivsy)
NextToken(); // recognize a /
else
break; // otherwise exit the loop
}
}
/*** Expression **************/
void RuleExpression()
{
while(FIRST_Term) { // is there a term?
RuleTerm()
if (token == plussy)
NextToken(); // recognize a +
else if (token == minussy)
NextToken(); // recognize a -
else
break; // otherwise exit the loop
}
}</PRE>These two rules are very similar. Since
they must employ a looping strategy, they make use of the FIRSTs for the rules
that they loop on. Once <TT>RuleTerm()</TT> or <TT>RuleFactor()</TT> returns,
they verify that the current token is an appropriate operator. If no operator is
present, they return. Notice that if there is no match that these two rules do
not bother processing errors. The strategy is for each rule to parse only what
it knows, and leave the rest to whatever rule called it.
<P> There is another set called FOLLOW, which is the set of all tokens that
may follow a rule. The current token could be tested against this set in order
to make sure that whatever token happens to follow when the current rule returns
will be valid for the calling rule. Later on when we take a look at error
detection and handling, we will see a method for computing FOLLOW at runtime.
<P> The final rule for Factor is pretty straightforward. <PRE> /*** Factor ******************/
void RuleFactor()
{
if (token == idconsy)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -