📄 chapter 4 the parserw.htm
字号:
NextToken();
else if (token == lparentsy) {
NextToken();
RuleExpression();
Accept(rparentsy);
}
}</PRE>The last thing that we need is a function
<TT>main()</TT> to initialize the scanner and start the parser. <PRE> void main()
{
//*** Perform any initialization code here, if needed...
//*** Parse!
NextToken(); // Prime the scanner
if(FIRST_Statements) {
RuleStatements(); // Call the root rule
Accept(eopsy); // make sure we hit the end of the file
}
else
printf(stderr, "Statements not found\n");
//*** Do any cleanup work that needs to be done...
}</PRE><TT>main()</TT> accomplishes its task in
three easy steps.
<OL>
<LI>Before calling the first rule, we invoke <TT>NextToken()</TT> once. This
primes the variable <TT>token</TT>, so that our starting rule can begin its
work immediately. <BR>
<P> </P>
<LI>Notice that before we call <TT>RuleStatements()</TT> we check to see if
the current token matches <TT>FIRST_Statements</TT>. This step is not
necessary for every grammar, especially when the rule checks to see if it is
starting on the right token. However, we have a special case with this
grammar. Remember that <TT>RuleStatements()</TT> just fell through if the
current token didn't match <TT>FIRST_IfStatement</TT> or
<TT>FIRST_AssignmentStatement</TT>. In this case, we need to check. Once we
have verified that <TT>token</TT> is in our set of FIRSTs, we begin parsing
the token stream by calling <TT>RuleStatements()</TT>. <BR>
<P> </P>
<LI>Once <TT>RuleStatements()</TT> returns to <TT>main()</TT> we need to
verify that we have actually reached the end of the program. This is
accomplished by testing <TT>token</TT> to see if it equals <TT>eopsy</TT>. We
do this by making one last call to <TT>Accept()</TT>. </LI></OL>This completes
our simple example.
<H2>4.5 Error-Checking</H2><!-------------------------------------------------------------------------------->This
parser works fine for perfect programmers who never make mistakes. However, due
to human error, it is inevatible that a programmer will forget a semicolon, or
type something like <TT>end if</TT> instead of <TT>endif</TT>, or make some
other typing error. Our parser only handles one error at a time, and it doesn't
even give very descriptive error messages, at that.
<P> Ideally, we do not want the parser to run until it detects only one
error and then stop. Usually there is more than one error in any program,
especially if it is being compiled for the first time. The trick however, is for
the parser and the stream of tokens to remain properly synchronized. Error
checking within a recursive descent parser is handled by use of two sets in
addition to FIRST, known as CURRENT and FOLLOW. <BR> <BR>
<H3>4.5.1 CURRENT and FOLLOW</H3><!-------------------------------------------------------------------------------->Of
all the sets, CURRENT is the easiest to maintain. For any given rule <I>S</I>,
CURRENT-<I>S</I> is the set of all tokens that <I>S</I> expects to encounter but
has not yet scanned (in other words, all tokens that happen later on in the
rule).
<P>Formally, the initial CURRENT for any given rule <I>S</I> contains all
terminals accepted by <I>S</I> as well as the set of all FIRSTs for all other
rules invoked by <I>S</I>. For example, take a look at rule I (IfStatement) in
figure {SYNSTA}. The initial elements of the set CURRENT-IfStatement are <PRE> {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements</PRE>As
<TT>RuleIfStatement()</TT> accepts tokens or calls other rules, these elements
will be deleted. For example. The first thing rule <TT>RuleIfStatement()</TT>
does is accept an <TT>ifsy</TT>. Before accepting this token, it removes it from
its CURRENT. The rule then calls <TT>RuleExpression()</TT>. Before calling it,
<TT>RuleIfStatement()</TT> removes FIRST-Expression from its current. It then
accepts a <TT>thensy</TT> removing it beforehand from current, and so on.
Pseudocode for the function will look something like this: <PRE> /*** If **********************/
void RuleIfStatement()
{
// Initialize current here.
Current = {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
Current = {thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
Accept(ifsy); // recognize an if
Current = {thensy, endifsy} <B>union</B> FIRST-Statements
RuleExpression(); // process an expression
Current = {endifsy} <B>union</B> FIRST-Statements
Accept(thensy); // recognize a then
Current = {endifsy}
RuleStatements(); // process a series of statements
Current = {}
Accept(endifsy); // process an endif
}</PRE>The method of removing the token prior to
accepting is important for later on. The method of repeatedly rebuilding current
in this example is somewhat redundant. This was done on purpose. It certainly
would be easier to remove the tokens as they were accepted, instead of
rebuilding the set. Later, we will talk about a better way of maintaining
current.
<P>The set FOLLOW is constructed in a different way. The formal definition of
FOLLOW for any rule <I>S</I> is the set of all terminals that can follow
<I>S</I>. In the case of this compiler we will not try to compute the entire set
of FOLLOW. Though it can be done, it is computationally expensive. We will use a
shortcut.
<P> A subset of FOLLOW-<I>S</I> can be computed at runtime by having each
rule calling <I>S</I> pass in its set of CURRENT. FOLLOW is properly maintained
by enforcing three rules:
<OL>
<LI>Each rule <I>S</I> recieves FOLLOW from the rule that called it. FOLLOW is
passed in to rule <I>S</I> <I>by value</I>.
<LI>Each rule <I>S</I> maintains (in some form) its own CURRENT, as previously
described.
<LI>When rule <I>S</I> calls some other rule <I>A</I>, it will pass in the the
set of FOLLOW that it received from its calling rule, unioned with CURRENT.
</LI></OL>Thus, FOLLOW for any rule <I>S</I> is a union of all previous and
properly maintained CURRENTs. To continue our previous example, pseudocode for
<TT>RuleIfStatement()</TT> will look something like this: <PRE> /*** If **********************/
void RuleIfStatement(Follow)
{
Current = {ifsy, thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
// recognize an if
Current = {thensy, endifsy} <B>union</B> FIRST-Expression <B>union</B> FIRST-Statements
Accept(ifsy);
// process an expression
Current = {thensy, endifsy} <B>union</B> FIRST-Statements
RuleExpression(Follow <B>union</B> Current);
// recognize a then
Current = {endifsy} <B>union</B> FIRST-Statements
Accept(thensy);
// process a series of statements
Current = {endifsy}
RuleStatements(Follow <B>union</B> Current);
// process an endif
Current = {}
Accept(endifsy);
}</PRE>Before we begin parsing, we initialize a
set of CURRENT to contain only the end-of-file token, and pass it into our start
rule. Our start rule takes this set as FOLLOW, and constructs its own CURRENT.
When calling other rules it passes in CURRENT+FOLLOW. Thus, with each successive
invocation of a rule the set of FOLLOW grows to include all of the elements
belonging to previous sets of CURRENT.
<P> <B>4.5.2 Accepting a Token</B> <BR><!-------------------------------------------------------------------------------->The
possibility exists, each time a token is consumed, that the token desired will
not be the same as the current token from the input stream. When this happens,
the parser should signal an error and then attempt to recover. In order to
properly recover, the parser makes use of both CURRENT and FOLLOW.
<P> The process of accepting a token is simple: compare the desired token
to the actual token from the token stream. If there is a match, then discard it
and get the next token.Otherwise, continue eating tokens until the current token
is an element of either CURRENT or FOLLOW. The only caveat for this procedure is
that the desired token should be removed from CURRENT beforehand. In pseudocode,
the algorithm would look like this: <BR> <BR> <PRE> CURRENT -= desiredtoken; // Remove desired token from CURRENT
if (token == desiredtoken) // If token and desiredtoken match, accept!
NextToken();
else { // Otherwise print an error and eat tokens
print errmessage;
SkipTo(CURRENT <B>union</B> FOLLOW);
}</PRE>We can put this into a function called
<TT>Accept()</TT>, which looks something like this: <PRE> void Accept(Symbol sy, Set &SyncSet, const char *_msg)
{
//*** Token is not correct
if (token.sy != sy) {
// print error message
SkipTo(SyncSet); // SkipTo a token that can follow
}
//*** Token is correct
else
NextToken();
}</PRE>The first parameter to <TT>Accept()</TT>
is the desired token. If it matches the current token from the scanner, then the
next token is retrieved. If there is no match, then the next step is to print
the error message and then repeatedly scan for tokens until there is a match to
the set, <TT>SyncSet</TT>. Using <TT>Accept()</TT> is a simple matter. The
current rule maintains CURRENT, composed of all tokens that it accepts, and the
FIRSTS for all rules that it may call. Prior to accepting a token, it is removed
from CURRENT, and then the function <TT>Accept()</TT> is called. The set,
<TT>SyncSet</TT> from the previous snippet of pseudocode is composed of the
union of CURRENT and FOLLOW. <PRE> CURRENT -= desiredtoken;
Accept(desiredtoken, CURRENT+FOLLOW, "Expected 'desiredtoken'");</PRE>This
section and the previous section represent the two fundamental concepts for
error checking and recovery in recursive descent parsing. In order to implement
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -