📄 chapter 4 the parserw.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
such a parser, we require the services of one additional tool: a facility to 
manage sets. 
<H3>4.5.3 Tools to Manage Sets</H3><!-------------------------------------------------------------------------------->The 
SAL compiler has a very robust set class, aptly named <TT>Set</TT>, which has 
been built for the purpose of managing sets of tokens, or more precisely, sets 
of symbols. 
<P>&nbsp;The set has no specific order, and only one of each item is allowed in 
the set at any given time. Its members are of the type <TT>symbol</TT>, which as 
you may recall from section 3.2.2 of the previous chapter, is the principle 
element of <TT>token</TT>. <TT>Set</TT> has three constructors: 
<MENU><B><TT>Set()</TT></B> <BR>Basic default constructor. All constructors 
  pre-initialize the set to the empty set. 
  <P>&nbsp;<B><TT>Set(symbol s)</TT></B> <BR>Initializes the set to contain only 
  the symbol, <TT>s</TT>. 
  <P>&nbsp;<B><TT>Set(Set &amp;other)</TT></B> <BR>Copy constructor. Copies 
  contents of <TT>other</TT> to <TT>*this</TT>. 
  <P>&nbsp;<B><TT>Set(symbol s, ...)</TT></B> <BR>Takes a series of symbols 
  terminated with <TT>badsy</TT>. <BR>&nbsp; <BR>&nbsp;</P></MENU>In these and all 
definitions, <TT>symbol</TT> is the set of all SAL symbols as defined in the 
file <TT>SAL-Scanner.h</TT>. The last constructor is useful for defining a set 
and initializing it all at once. For example: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Set ForSet(forsy, tosy, bysy, dosy, loopsy, badsy);</PRE>Make 
sure that the symbol <TT>badsy</TT> is always at the end of the argument list 
when using this constructor. In addition to these constructors, there is also a 
host of functions to operate on the <TT>Set</TT> class. 
<MENU><B><TT>void AddMember(symbol s)</TT></B> <BR>Adds a single 
  <TT>symbol</TT> to the <TT>Set</TT>. This function is declared as inline. 
  <P>&nbsp;<B><TT>void DelMember(symbol s)</TT></B> <BR>Deletes a single 
  <TT>symbol</TT> from the <TT>Set</TT>. This function is declared as inline. 
  <P>&nbsp;<B><TT>bool TestMember(symbol s)</TT></B> <BR>Tests a single member 
  of the set. Returns <TT>true</TT> if the member is set, <TT>false</TT> 
  otherwise. This function is declared as inline. 
  <P>&nbsp;<B><TT>void Empty()</TT></B> <BR>Returns true if the set is empty. 
  <P>&nbsp;<B><TT>void Assign(const Set &amp;other)</TT></B> <BR>This function 
  is called by the copy constructor, and basically does the same job. The 
  contents of <TT>other</TT> are copied into the set. 
  <P>&nbsp;<B><TT>bool Equals(Set &amp;other) const</TT></B> <BR>returns 
  <TT>true</TT> if <TT>other</TT> is the same. 
  <P>&nbsp;<B><TT>void Inverse()</TT></B> <BR>Toggles all the contents of a set. 

  <P>&nbsp;<B><TT>void Diff(const Set &amp;other)</TT></B> <BR>Removes from a 
  set all the members found in <TT>other</TT>. 
  <P>&nbsp;<B><TT>void Intersect(const Set &amp;other)</TT></B> <BR>Performs a 
  set intersection with <TT>other</TT>. 
  <P>&nbsp;<B><TT>void Union(const Set &amp;other)</TT></B> <BR>Performs a set 
  union with <TT>other</TT>. 
  <P>&nbsp;<B><TT>void Clear()</TT></B> <BR>Clears all members from the set. 
  <P>&nbsp;<B><TT>void Add(symbol s, ...)</TT></B> <BR>Adds a long series of 
  <TT>symbol</TT> arguments to the set. Argument list <I>must</I> be terminated 
  by <TT>badsy</TT>. 
  <P>&nbsp;<B><TT>void Del(symbol s, ...)</TT></B> <BR>Deletes a long series of 
  <TT>symbol</TT> arguments to the set. Argument list <I>must</I> be terminated 
  by <TT>badsy</TT>. Be careful using these two functions. 
  <P>&nbsp;<B><TT>void SetTo(symbol s, ...)</TT></B> <BR>Deletes all elements of 
  the set and resets it to the argument list. This set is useful for zeroing out 
  a set and setting it to a list of items in one step. Argument list <I>must</I> 
  be terminated by <TT>badsy</TT>. <BR>&nbsp; <BR>&nbsp;</P></MENU>As you can see, 
the set of functions to perform operations on the set class is fairly complete. 
In addition to these functions, a series of operators has also been defined. 
<BR>&nbsp; <BR>&nbsp; 
<MENU><B><TT>Set &amp;operator = (const Set other)</TT></B> <BR>Performs 
  assignment. This operator is declared as inline. 
  <P>&nbsp;<B><TT>Set &amp;operator &amp;= (const Set other)</TT></B> 
  <BR>Performs intersection-assignment. This operator is declared as inline. 
  <P>&nbsp;<B><TT>Set &amp;operator |= (const Set other)</TT></B> <BR>Performs a 
  union-assignment. This operator is declared as inline. 
  <P>&nbsp;<B><TT>Set &amp;operator -= (const Set other)</TT></B> <BR>Performs a 
  diff-assignment on the set with <TT>other</TT>. This operator is declared as 
  inline. 
  <P>&nbsp;<B><TT>Set &amp;operator &lt;&lt;= (const symbol s)</TT></B> <BR>Adds 
  symbol <TT>s</TT> to a set. This operator is declared as inline. 
  <P>&nbsp;<B><TT>Set &amp;operator &gt;&gt;= (const symbol s)</TT></B> 
  <BR>Deletes symbol <TT>s</TT> to a set. This operator is declared as inline. 
  <P>&nbsp;<B><TT>bool operator == (Set a, Set b)</TT></B> <BR>Compares two 
  sets. 
  <P>&nbsp;<B><TT>Set operator ~ (Set a)</TT></B> <BR>Returns the inverse of the 
  set (i.e., does not inverse the set, itself, but merely returns a value 
  inverse to the original set). 
  <P>&nbsp;<B><TT>Set operator + (Set a, Set b)</TT></B> <BR><B><TT>Set operator 
  + (Set a, Symbol s)</TT></B> <BR><B><TT>Set operator + (Symbol a, Symbol 
  b)</TT></B> <BR>Returns a set that is a union of the two parameters. Works the 
  same as the | operator. 
  <P>&nbsp;<B><TT>Set operator - (Set a, Set b)</TT></B> <BR><B><TT>Set operator 
  - (Set a, Symbol s)</TT></B> <BR>Returns the difference, <TT>a</TT> - 
  <TT>b</TT>, or <TT>a</TT> - <TT>s</TT>. 
  <P>&nbsp;<B><TT>Set operator &amp; (Set a, Set b)</TT></B> <BR>Returns the 
  intersection of <TT>a</TT> and <TT>b</TT>. 
  <P>&nbsp;<B><TT>Set operator | (Set a, Set b)</TT></B> <BR>Returns the union 
  of <TT>a</TT> and <TT>b</TT>. 
  <P>&nbsp;<B><TT>Set operator &lt;&lt; (Set a, symbol s)</TT></B> <BR>Returns 
  the set, <TT>a</TT> | {<TT>s</TT>}. 
  <P>&nbsp;<B><TT>Set operator &gt;&gt; (Set a, symbol s)</TT></B> <BR>Returns 
  the set, <TT>a</TT> - {<TT>s</TT>}. <BR>&nbsp; <BR>&nbsp;</P></MENU>The 
definition of CURRENT that we have discussed previously has been a purely 
mathematical one. As computer scientists however, we know that mathematics and 
practical implemtntation are not always synonymous. The set class that we 
presented here is both parsimonious and fast, but it has one major limitation: 
it does not allow duplicate items. 
<P>&nbsp;In the case of FOLLOW we can overcome this limitation by always passing 
the set in <I>by value</I> each time a rule is called. Thus, a rule is 
guaranteed that its FOLLOW will never be corrupted by any rule that it calls. 
<P>&nbsp;A more subtle problem arises in maintaining CURRENT, which can take on 
two related forms: 
<OL>
  <LI>A rule contains a loop. 
  <LI>A rule calls another more than once, or uses the same token more than 
  once. </LI></OL>Since the our set class does not allow duplicates, we have to be 
careful that when we remove used tokens we do not remove items that might appear 
again, later on in the rule. This is best accomplished by a mixture of simply 
deleting the accepted token, and scrapping CURRENT and rebuilding it from 
scratch. The first case is feasible when we are <I>sure</I> that there is no 
duplication in CURRENT, i.e., we know that we can remove element x or set 
FIRST-X, because we know that x will not appear later on in the rule, or that 
none of the elements of FIRST-X will appear later on in the rule. If we know 
that a single token will appear later on, then we need to leave it in CURRENT 
until we no longer need it. On the other hand, if one or more elements of 
FIRST-X will be needed later on, then CURRENT should be rebuilt from scratch. 
<BR>&nbsp; <BR>&nbsp; 
<H3>4.5.4 SAL Parser and Error-Checking Facilities</H3><!-------------------------------------------------------------------------------->Let's 
put this all together and rework the previous example to contain error-checking. 
At this time we should also adapt this example so that it looks more like the 
implementation of the SAL compiler. The SAL compiler contains many facilities to 
aid error-checking and synchronization, as well as the set class that we have 
just presented. 
<P>&nbsp;In the SAL compiler, all the sets of FIRST are defined as global 
constants. There are three major groups of #defines for constructing sets. There 
is a comma series, best used when calling the constructor of a set, and the 
other much safer series uses the left-shift operator. All members of the comma 
series are prefixed by an <TT>_f</TT>, and all members of the left-shift series 
are prefixed by a single <TT>f</TT>. In both cases, f stands for first. For our 
example, the FIRSTS are predefined as follows. <BR>&nbsp; <BR>&nbsp; <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fFactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy, lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fTerm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy, lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fExpression&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy, lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fAssignmentStatement idconsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fIfStatement&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ifsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define _fStatements&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy, ifsy

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fFactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy &lt;&lt; lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fTerm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy &lt;&lt; lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fExpression&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy &lt;&lt; lparentsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fAssignmentStatement&nbsp; idconsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fIfStatement&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ifsy
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #define fStatements&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; idconsy &lt;&lt; ifsy

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstFactor(_fAssignmentStatement, badsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstTerm(_fTerm, badsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstExpression(_fExpression, badsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstAssignmentStatement(_fAssignmentStatement, badsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstIfStatement(_fIfStatement, badsy);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const Set FirstStatements(_fStatements, badsy);</PRE>The 
<TT>_f</TT> group is used in variable argument lists. You can see an example of 
its use in the third group. This group is really useful for initializing 
CURRENT. This group is also useful when calling <TT>Set::SetTo()</TT>. 
<P>&nbsp;The <TT>f</TT> group uses the left-shift operator (not streams) to add 
a series of tokens to an existing set. It is useful for adding a group of tokens 
to an existing set, and is a lot safer than the first group. There no need to 
terminate these lists with <TT>badsy</TT>. 
<P>&nbsp;The actual sets of FIRST are prefixed by the word <TT>First</TT>, as is 
seen at the bottom of the previous listing. These were defined here mainly for 
convenience. They could also have been made static local variables. The main 
point of defining these here is to avoid having each rule (which can be called 
recursively several times) define and redefine and redefine them, each time it 
is called. 
<P>&nbsp;The method of using variable argument lists, though slightly dangerous 
(using variable argument lists is never 100% safe), is highly readable. The 
programmer is cautioned to always terminate the variable argument lists with the 
symbol <TT>badsy</TT>. The corresponding sets for the SAL compiler are located 
in the files <TT>SAL-TokenSets.h</TT> and <TT>SAL-TokenSets.cpp</TT>. 
<P>&nbsp;The file <TT>SAL-Error.cpp</TT> and <TT>SAL-Error.h</TT> contain 
several useful functions for parsing, error-checking and error recovery. 
<P>&nbsp;<B><TT>void Accept(symbol sy, Set &amp;Follow, const char *_msg, 
...);</TT></B> <BR>This is the easiest method for accepting a token. The first 
parameter is the symbol for the token to be accepted. The second parameter is a 
set of symbols for tokens that may follow <TT>sy</TT>. This set should be 
(CURRENT-sy) + FOLLOW. The last parameter(s) are a series of normal 
<TT>printf()</TT> arguments. If we wanted to use this function to accept the 
token <TT>identsy</TT>, we would do something like this: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current &gt;&gt;= identsy;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Accept(identsy, Current+Follow, "Identifier not found");</PRE>This 
function was designed so that the user could treat it very much like a 
<TT>printf()</TT> statement. 
<P><B><TT>void AcceptStd(symbol sy, Set &amp;Follow, int errnum);</TT></B> 
<BR>There are certain error messages that are repeated fairly often, and as such 
could get tedious to write over and over again. The Scanner is a good example of 
this. It has a handful of error messages which can occur in a plethora of 
different places. Some of the more common parser errors are for missing 
delimeters, such as commas, semicolons, etc. If we wanted to scan for a 
semicolon, we could do this: <PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Current &gt;&gt;= semicolonsy;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; AcceptStd(semicolonsy, Current+Follow, ER_NOSEMICOLON);</PRE><B><TT>bool 
TestToken(Set Current);</TT></B> <BR>This function provides a good way to test 
and see if the symbol for a current token is an element of a set without 
generating an er
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -