📄 chapter 4 the parserw.htm
字号:
such a parser, we require the services of one additional tool: a facility to
manage sets.
<H3>4.5.3 Tools to Manage Sets</H3><!-------------------------------------------------------------------------------->The
SAL compiler has a very robust set class, aptly named <TT>Set</TT>, which has
been built for the purpose of managing sets of tokens, or more precisely, sets
of symbols.
<P> The set has no specific order, and only one of each item is allowed in
the set at any given time. Its members are of the type <TT>symbol</TT>, which as
you may recall from section 3.2.2 of the previous chapter, is the principle
element of <TT>token</TT>. <TT>Set</TT> has three constructors:
<MENU><B><TT>Set()</TT></B> <BR>Basic default constructor. All constructors
pre-initialize the set to the empty set.
<P> <B><TT>Set(symbol s)</TT></B> <BR>Initializes the set to contain only
the symbol, <TT>s</TT>.
<P> <B><TT>Set(Set &other)</TT></B> <BR>Copy constructor. Copies
contents of <TT>other</TT> to <TT>*this</TT>.
<P> <B><TT>Set(symbol s, ...)</TT></B> <BR>Takes a series of symbols
terminated with <TT>badsy</TT>. <BR> <BR> </P></MENU>In these and all
definitions, <TT>symbol</TT> is the set of all SAL symbols as defined in the
file <TT>SAL-Scanner.h</TT>. The last constructor is useful for defining a set
and initializing it all at once. For example: <PRE> Set ForSet(forsy, tosy, bysy, dosy, loopsy, badsy);</PRE>Make
sure that the symbol <TT>badsy</TT> is always at the end of the argument list
when using this constructor. In addition to these constructors, there is also a
host of functions to operate on the <TT>Set</TT> class.
<MENU><B><TT>void AddMember(symbol s)</TT></B> <BR>Adds a single
<TT>symbol</TT> to the <TT>Set</TT>. This function is declared as inline.
<P> <B><TT>void DelMember(symbol s)</TT></B> <BR>Deletes a single
<TT>symbol</TT> from the <TT>Set</TT>. This function is declared as inline.
<P> <B><TT>bool TestMember(symbol s)</TT></B> <BR>Tests a single member
of the set. Returns <TT>true</TT> if the member is set, <TT>false</TT>
otherwise. This function is declared as inline.
<P> <B><TT>void Empty()</TT></B> <BR>Returns true if the set is empty.
<P> <B><TT>void Assign(const Set &other)</TT></B> <BR>This function
is called by the copy constructor, and basically does the same job. The
contents of <TT>other</TT> are copied into the set.
<P> <B><TT>bool Equals(Set &other) const</TT></B> <BR>returns
<TT>true</TT> if <TT>other</TT> is the same.
<P> <B><TT>void Inverse()</TT></B> <BR>Toggles all the contents of a set.
<P> <B><TT>void Diff(const Set &other)</TT></B> <BR>Removes from a
set all the members found in <TT>other</TT>.
<P> <B><TT>void Intersect(const Set &other)</TT></B> <BR>Performs a
set intersection with <TT>other</TT>.
<P> <B><TT>void Union(const Set &other)</TT></B> <BR>Performs a set
union with <TT>other</TT>.
<P> <B><TT>void Clear()</TT></B> <BR>Clears all members from the set.
<P> <B><TT>void Add(symbol s, ...)</TT></B> <BR>Adds a long series of
<TT>symbol</TT> arguments to the set. Argument list <I>must</I> be terminated
by <TT>badsy</TT>.
<P> <B><TT>void Del(symbol s, ...)</TT></B> <BR>Deletes a long series of
<TT>symbol</TT> arguments to the set. Argument list <I>must</I> be terminated
by <TT>badsy</TT>. Be careful using these two functions.
<P> <B><TT>void SetTo(symbol s, ...)</TT></B> <BR>Deletes all elements of
the set and resets it to the argument list. This set is useful for zeroing out
a set and setting it to a list of items in one step. Argument list <I>must</I>
be terminated by <TT>badsy</TT>. <BR> <BR> </P></MENU>As you can see,
the set of functions to perform operations on the set class is fairly complete.
In addition to these functions, a series of operators has also been defined.
<BR> <BR>
<MENU><B><TT>Set &operator = (const Set other)</TT></B> <BR>Performs
assignment. This operator is declared as inline.
<P> <B><TT>Set &operator &= (const Set other)</TT></B>
<BR>Performs intersection-assignment. This operator is declared as inline.
<P> <B><TT>Set &operator |= (const Set other)</TT></B> <BR>Performs a
union-assignment. This operator is declared as inline.
<P> <B><TT>Set &operator -= (const Set other)</TT></B> <BR>Performs a
diff-assignment on the set with <TT>other</TT>. This operator is declared as
inline.
<P> <B><TT>Set &operator <<= (const symbol s)</TT></B> <BR>Adds
symbol <TT>s</TT> to a set. This operator is declared as inline.
<P> <B><TT>Set &operator >>= (const symbol s)</TT></B>
<BR>Deletes symbol <TT>s</TT> to a set. This operator is declared as inline.
<P> <B><TT>bool operator == (Set a, Set b)</TT></B> <BR>Compares two
sets.
<P> <B><TT>Set operator ~ (Set a)</TT></B> <BR>Returns the inverse of the
set (i.e., does not inverse the set, itself, but merely returns a value
inverse to the original set).
<P> <B><TT>Set operator + (Set a, Set b)</TT></B> <BR><B><TT>Set operator
+ (Set a, Symbol s)</TT></B> <BR><B><TT>Set operator + (Symbol a, Symbol
b)</TT></B> <BR>Returns a set that is a union of the two parameters. Works the
same as the | operator.
<P> <B><TT>Set operator - (Set a, Set b)</TT></B> <BR><B><TT>Set operator
- (Set a, Symbol s)</TT></B> <BR>Returns the difference, <TT>a</TT> -
<TT>b</TT>, or <TT>a</TT> - <TT>s</TT>.
<P> <B><TT>Set operator & (Set a, Set b)</TT></B> <BR>Returns the
intersection of <TT>a</TT> and <TT>b</TT>.
<P> <B><TT>Set operator | (Set a, Set b)</TT></B> <BR>Returns the union
of <TT>a</TT> and <TT>b</TT>.
<P> <B><TT>Set operator << (Set a, symbol s)</TT></B> <BR>Returns
the set, <TT>a</TT> | {<TT>s</TT>}.
<P> <B><TT>Set operator >> (Set a, symbol s)</TT></B> <BR>Returns
the set, <TT>a</TT> - {<TT>s</TT>}. <BR> <BR> </P></MENU>The
definition of CURRENT that we have discussed previously has been a purely
mathematical one. As computer scientists however, we know that mathematics and
practical implemtntation are not always synonymous. The set class that we
presented here is both parsimonious and fast, but it has one major limitation:
it does not allow duplicate items.
<P> In the case of FOLLOW we can overcome this limitation by always passing
the set in <I>by value</I> each time a rule is called. Thus, a rule is
guaranteed that its FOLLOW will never be corrupted by any rule that it calls.
<P> A more subtle problem arises in maintaining CURRENT, which can take on
two related forms:
<OL>
<LI>A rule contains a loop.
<LI>A rule calls another more than once, or uses the same token more than
once. </LI></OL>Since the our set class does not allow duplicates, we have to be
careful that when we remove used tokens we do not remove items that might appear
again, later on in the rule. This is best accomplished by a mixture of simply
deleting the accepted token, and scrapping CURRENT and rebuilding it from
scratch. The first case is feasible when we are <I>sure</I> that there is no
duplication in CURRENT, i.e., we know that we can remove element x or set
FIRST-X, because we know that x will not appear later on in the rule, or that
none of the elements of FIRST-X will appear later on in the rule. If we know
that a single token will appear later on, then we need to leave it in CURRENT
until we no longer need it. On the other hand, if one or more elements of
FIRST-X will be needed later on, then CURRENT should be rebuilt from scratch.
<BR> <BR>
<H3>4.5.4 SAL Parser and Error-Checking Facilities</H3><!-------------------------------------------------------------------------------->Let's
put this all together and rework the previous example to contain error-checking.
At this time we should also adapt this example so that it looks more like the
implementation of the SAL compiler. The SAL compiler contains many facilities to
aid error-checking and synchronization, as well as the set class that we have
just presented.
<P> In the SAL compiler, all the sets of FIRST are defined as global
constants. There are three major groups of #defines for constructing sets. There
is a comma series, best used when calling the constructor of a set, and the
other much safer series uses the left-shift operator. All members of the comma
series are prefixed by an <TT>_f</TT>, and all members of the left-shift series
are prefixed by a single <TT>f</TT>. In both cases, f stands for first. For our
example, the FIRSTS are predefined as follows. <BR> <BR> <PRE> #define _fFactor idconsy, lparentsy
#define _fTerm idconsy, lparentsy
#define _fExpression idconsy, lparentsy
#define _fAssignmentStatement idconsy
#define _fIfStatement ifsy
#define _fStatements idconsy, ifsy
#define fFactor idconsy << lparentsy
#define fTerm idconsy << lparentsy
#define fExpression idconsy << lparentsy
#define fAssignmentStatement idconsy
#define fIfStatement ifsy
#define fStatements idconsy << ifsy
const Set FirstFactor(_fAssignmentStatement, badsy);
const Set FirstTerm(_fTerm, badsy);
const Set FirstExpression(_fExpression, badsy);
const Set FirstAssignmentStatement(_fAssignmentStatement, badsy);
const Set FirstIfStatement(_fIfStatement, badsy);
const Set FirstStatements(_fStatements, badsy);</PRE>The
<TT>_f</TT> group is used in variable argument lists. You can see an example of
its use in the third group. This group is really useful for initializing
CURRENT. This group is also useful when calling <TT>Set::SetTo()</TT>.
<P> The <TT>f</TT> group uses the left-shift operator (not streams) to add
a series of tokens to an existing set. It is useful for adding a group of tokens
to an existing set, and is a lot safer than the first group. There no need to
terminate these lists with <TT>badsy</TT>.
<P> The actual sets of FIRST are prefixed by the word <TT>First</TT>, as is
seen at the bottom of the previous listing. These were defined here mainly for
convenience. They could also have been made static local variables. The main
point of defining these here is to avoid having each rule (which can be called
recursively several times) define and redefine and redefine them, each time it
is called.
<P> The method of using variable argument lists, though slightly dangerous
(using variable argument lists is never 100% safe), is highly readable. The
programmer is cautioned to always terminate the variable argument lists with the
symbol <TT>badsy</TT>. The corresponding sets for the SAL compiler are located
in the files <TT>SAL-TokenSets.h</TT> and <TT>SAL-TokenSets.cpp</TT>.
<P> The file <TT>SAL-Error.cpp</TT> and <TT>SAL-Error.h</TT> contain
several useful functions for parsing, error-checking and error recovery.
<P> <B><TT>void Accept(symbol sy, Set &Follow, const char *_msg,
...);</TT></B> <BR>This is the easiest method for accepting a token. The first
parameter is the symbol for the token to be accepted. The second parameter is a
set of symbols for tokens that may follow <TT>sy</TT>. This set should be
(CURRENT-sy) + FOLLOW. The last parameter(s) are a series of normal
<TT>printf()</TT> arguments. If we wanted to use this function to accept the
token <TT>identsy</TT>, we would do something like this: <PRE> Current >>= identsy;
Accept(identsy, Current+Follow, "Identifier not found");</PRE>This
function was designed so that the user could treat it very much like a
<TT>printf()</TT> statement.
<P><B><TT>void AcceptStd(symbol sy, Set &Follow, int errnum);</TT></B>
<BR>There are certain error messages that are repeated fairly often, and as such
could get tedious to write over and over again. The Scanner is a good example of
this. It has a handful of error messages which can occur in a plethora of
different places. Some of the more common parser errors are for missing
delimeters, such as commas, semicolons, etc. If we wanted to scan for a
semicolon, we could do this: <PRE> Current >>= semicolonsy;
AcceptStd(semicolonsy, Current+Follow, ER_NOSEMICOLON);</PRE><B><TT>bool
TestToken(Set Current);</TT></B> <BR>This function provides a good way to test
and see if the symbol for a current token is an element of a set without
generating an er
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -