📄 chapter 6 assignment and expressions.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:

        var
          y: card;
          z: int;

      begin
        y:= 10;
        //   ^-- Make this 10 be a card32

        z:= 10;
        //   ^-- Make this 10 be an int32

      end program.
</PRE>Here we have two variables. One is a <TT>card</TT> and the other is an 
<TT>int</TT>. The problem comes up when we are trying to load the value of 10 
into either one. In one instance we want 10 to be a <TT>card32</TT> (the default 
size for a <TT>card</TT>), and in another instance we are making it be an 
<TT>int32</TT> (the default size for an <TT>int</TT>). 
<P>This problem is solved by having rule Expression take a "hint" type that it 
passes on to each rule until rule Factor is called. Rule Factor then takes this 
hint (which is contained in RType) and use that to set the constant's type. So, 
in the case of variable <TT>y</TT> in the previous example, 
<TT>RuleAssignment()</TT> calls <TT>RuleIdentExpr()</TT> and gets the type for 
the variable <TT>y</TT>. This type is a (<TT>cardtyp</TT>, <TT>_u32s</TT>). 
Next, <TT>RuleExpression()</TT> is called, and the type for <TT>y</TT> is passed 
in. <TT>RuleExpression()</TT> calls other rules, which in turn eventually call 
<TT>RuleFactor()</TT>, which will be given the type of <TT>y</TT>. 
<TT>RuleFactor()</TT> can then use this information to make sure that the number 
10 is emitted in the proper size: a dword. It ends up looking like this: 
<P><PRE>      LID    0000000A  ; The number 10 in 32-bit hex
</PRE>When we get into <TT>RuleFactor()</TT>, we must check to see if the value 
of the constant fits the size that was indicated. For instance, if we have a 
variable that is an <TT>int16</TT>, we could not assign a literal value of 
50,000 to it. It just wouldn't fit. However, this does not always constitute an 
error. Consider the statement, <PRE>      write 9123456789;
</PRE>The number 9123456789 is obviously too big for an <TT>int32</TT>, but that 
does not matter. We only check to see if the number can be expressed within the 
boundaries of the type that <TT>RType</TT> is set to and ignore it if the 
constant fits within that type, or pick the next biggest type that the constant 
will fit into. If the constant will not fit into that type, then we at least try 
to fit into the same domain. For instance, if the constant is a 50000, and 
<TT>int16</TT> was specified, then we do not choose to set <TT>RType</TT> to a 
<TT>card16</TT>. Instead, we would use an <TT>int32</TT>. <PRE>       Type    Minimum                       Maximum
      ====================================================================
       int8    -128                           127
       int16   -32,768                        32,767
       int32   -2,147,483,648                 2,147,483,647
       int64   -9,223,372,036,854,775,808     9,223,372,036,854,775,807
       card8    0                             255
       card16   0                             65,536
       card32   0                             4,294,967,295
       card64   0                             18,446,744,073,709,551,615
</PRE>By using this method, we make it the responsibility of the calling rule to 
verify that the type that we return is acceptable. This is something that they 
should do, anyway. 
<P>
<H4>6.3.1.4 Emitting a Real (semi-optional)</H4>We have a similar problem for 
real numbers as for integers and cardinals. <B>You should emit 64-bit reals, 
32-bit is optional.</B> Again, the source of the problem is that of overlapping 
domains. The problem is handled in much the same way. Use the following table to 
discern the limits of floating point numbers: <PRE>       Type    Minimum                       Maximum
      ====================================================================
       real32  +|- 3.40282347E+38            +|- 1.40239846E-45
       real64  +|- 1.79769313486231570E+308  +|- 4.94065645841246544E-324
</PRE>This is consistent with the IEEE 754 floating-point standard. 
<P>
<H3>6.3.2 Emitting code for a literal string constant</H3><!----------------------------------------------------------------------------->When 
the string was recognized by the scanner, it should have been stored in a 
temporary buffer. The result of a string expression is always an address to the 
start of the string. Suppose we have this line: <PRE>      write "Hello, world!\n";
</PRE>The parser works its way into rule WriteStatement, where it begins calling 
rule Expression over and over again. A string literal is a special case of 
expression. Its type is <TT>arraytyp</TT>, its subtype is <TT>_u8</TT>, and its 
block pointer is NULL. This is how your compiler is supposed to tell a string 
apart from a simple array of characters. 
<P>When the compiler recognizes a string, it is always supposed to generate an 
address to the start of the string. All strings are stored in the string data 
area of the module. We should never have to worry about strings stored in other 
modules, since strings are never accessed externally. There are no symbolic 
string constants in SAL. 
<P>Generating the string's address is done by using the <TT>LSTA</TT> 
instruction. It takes a 32-bit immediate parameter for the strings offset in the 
SDA. There is a function that packs the string and gets the offset in one fell 
swoop, called <TT>PackStringLiteral()</TT>. Here is its prototype: <PRE>      DWORD PackStringLiteral(const char *str);
</PRE>As we can see this function takes a pointer to a string, and returns the 
offset where the string will be stored at. The rest of the matter is trivial. 
The code to be generated looks like this. <PRE>      DWORD stroffs = PackStringLiteral(Token.data.str.txt);
      Emit (xLSTA, stroffs);
</PRE>The only remaining thing to do is properly fill in the fields of RType. 
<P>
<H3>6.3.3 Other Things Factor Does</H3><!----------------------------------------------------------------------------->Factor 
calls rule Expression if it encounters parentheses. No code has to be generated 
to do this. This sort of thing is taken care of automatically. The other rule 
that Factor calls is to IdentExpr. Nothing additional needs to be done for this, 
either. 
<P><!----------------------------------------------------------------------------->
<H2>6.4 Rule IdentExpr</H2><!----------------------------------------------------------------------------->This 
is one of the most important rules in the compiler. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE23.gif"> 
  <P><FONT face=arial size=-1><B>Figure {RULE23}.</B></FONT> </P></MENU>Based on 
the type of identifier returned by <TT>RuleQualIdent()</TT>, we will process a 
variable, a constant, a type conversion, a function/procedure call, or an array 
or record. The prototype for this function should look like this: <PRE>      void IdentExpr ( Set Follow, BOOLEAN MakeRValue, Type &amp;RType )
</PRE>The first parameter is self-explanatory. The second parameter is a 
command, telling <TT>RuleIdentExpr()</TT> to make an rvalue or an lvalue. The 
third parameter is a reference to a type, which is filled in by this function. 
<P>The first thing that we want to do in this function is get the current token 
(which we already know to be an identifier), and look it up in the symbol table. 
This is done through a call to <TT>QualIdent()</TT>. If <TT>QualIdent()</TT> is 
successful, it will return a pointer to a valid <TT>Ident</TT>. At this point, 
if the return value is <TT>NULL</TT>, we set <TT>RType</TT> to a no-type value. 
This is done by calling <TT>Type::Init()</TT>, and passing in <TT>notyp</TT> and 
<TT>nosubtyp</TT>, like so: <PRE>      RType.Init(notyp, nosubtyp);
</PRE>We then return. This should effectively be a signal to the caller that the 
identifier was not recognized. If <TT>QualIdent()</TT> returns a valid pointer, 
we initialize <TT>RType</TT> to the type of the identifier returned. This can 
also be done by a call to <TT>Type::Init()</TT>. <PRE>      RType.Init(ident);
</PRE>Once we have a pointer to an identifier in the symbol table, there are 
several levels at which we can analyze it. The topmost level tells us what the 
identifier is, whether it is a constant, a type, a variable, a procedure, or a 
function. In rule IdentExpr we split off based on this information. We can use 
the <TT>Ident::getObj()</TT> to get the identifier's object type. Procedure and 
function calls will be discussed in chapter 8. We will discuss the rest of these 
in detail, beginning with symbolic constants. 
<P>
<H3>6.4.1 Symbolic Constants</H3><!----------------------------------------------------------------------------->A 
symbolic constant is one that is defined in a <TT>const</TT> declaration. 
Calling <TT>Ident::getObj()</TT> will return the value <TT>constobj</TT>. Here 
is an example of a constant's declaration. <PRE>      program AConstant;

        const
          x = 100;

      begin
        write x;
      end program.
</PRE>All symbolic constants are rvalues; you can't take the address of a 
constant, nor can you assign anything to it. For the most part, dealing with a 
constant symbolically is the same as dealing with a constant literally. The only 
difference is that when a constant is literal we extract its value from a token. 
Here our constant is symbolic. We can cast our <TT>Ident</TT> pointer to a 
<TT>ConstantIdent</TT> pointer by calling <TT>Ident::toConstantIdent()</TT>. 
This will return a pointer that is a <TT>ConstantIdent</TT>. We can then 
retrieve a <TT>ConstRec</TT> that has stored the value by calling 
<TT>ConstantIdent::getValue()</TT>. 
<P>There does remain one issue with type resolution as mentioned in sections 
6.3.1.3 and 6.3.1.4, and that is that symbolic constants do have some type, 
whereas literal constants do not. Here is an example of two symbolic constants: <PRE>      const
        c1 = -100;    
        c2 =  100;
</PRE>At declaration, if a constant has a sign (either a plus or a minus) it 
must be an integer. Its type should be set to <TT>inttyp</TT>, and its subtype 
should be set to <TT>nosubtyp</TT>. This way we can make sure that signed 
symbolic constants are always treated as an integer. If there is no sign, then 
the constant's type should be set to <TT>cardtyp</TT>, and the subtype should be 
<TT>nosubtyp</TT>. 
<P>Setting the type in this manner at declaration time is a form of loose 
classification (note that we did not designate a subtype). The value of the 
symbolic constant is loosely classified as "signed" (definately an integer), or 
"not signed" (possibly an integer, but could also be cardinal). We use the type 
of <TT>inttyp</TT> to show that the number needs to be an integer, and we use 
the type of <TT>cardtype</TT> to show that the number could be either an integer 
or a cardinal. 
<P><TT>RuleIdentExpr()</TT> will use the type field for resolving integers' and 
cardinals' type by applying the same rules as outlined in section 6.3.1.3 with 
the following additions. If the constant's type is an integer, then 
<TT>RType.type</TT> must be set to <TT>inttyp</TT>. Perform bounds checking to 
make sure that the value of the constant will fit into the specified number of 
bits, i.e., make sure that if they specify a certain amount of bits that the 
constant will fit. If the constant does not fit, set the subtype to the smallest 
that the constant will fit into. 
<P>If the constant's type is a <TT>cardtyp</TT> then we utilize the hint value 
in <TT>RType</TT> in the same manner as described in section 6.3.1.3. Again, we 
want to use the value of <TT>cardtyp</TT> to signify that the type of this 
constant is pending. Also, again we check to see that the value will fit into 
the specified subtype, otherwise we pick the smallest subtype that our number 
will fit into. 
<P>All these maneuvers are working around an important issue that compilers have 
to face. There is an ambiguity between integers and cardinals because the 
domains of the two overlap. Figure {INTDOM} gives a picture of this. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/INTDOM.gif"> 
  <P><FONT face=arial size=-1><B>Figure {INTDOM}.</B> This figure shows the 
  overlapping domains of int16 and card16. This introduces an ambiguity in 
  certain areas of the compiler.</FONT> </P></MENU>When it comes to symbolic 
constants, we want to give ourselves as much latitude as possible in order to 
allow the language to be flexible. 
<H3>6.4.2 Variables</H3><!----------------------------------------------------------------------------->We 
know if our identifier is a variable when the call to <TT>Ident::getObj()</TT> 
yields a <TT>varobj</TT>. A variable can be either an lvalue or an rvalue. This 
does not necessarily complicate things. It just means that we use a bifurcated 
method when dealing with any type of variable, including records and arrays (we 
will talk about those later). However, as we shall see with functions, it is 
also possible to have a variable that is a parameter that is passed by 
reference. This is the closest that we will come to dealing with pointers at 
this time. A pointer <I>is</I> used for variables that are passed by reference. 
<P>
<H4>6.4.2.1 A Variable's Storage Location</H4>Variables can be either passed by 
value or passed by reference. Variables that are passed by value can be stored 
in one of four different areas. Once an identifier has been found in the symbol 
table and determined to be a variable, we can query its properties to find out 
where in memory its value will be stored at runtime. A variable can be stored in 
an external module, in global memory, in local memory, or in a parent function's 
local memory (previous scope). We test the varible in this order to determine 
its location: 
<OL>
  <LI><B>In an external module.</B> If <TT>Ident::getMod()</TT> returns a value 
  that is not equal to <TT>table.ModNum</TT>, we know that the variable is in a 
  different module. When a module imports a variable from an external module, 
  the compiler has to tell the VM to look in the other module's GDA. The 
  <TT>LGx</TT> instructions perform this task. 
  <P></P>
  <LI><B>In global memory.</B> The method <TT>Ident::getFuncLev()</TT> returns 
  the scope level of an identifier. If the value returned is zero, then we know 
  that the variable is global. This means all the global variables for the 
  current module. These variables are found at some offset from the G register 
  (the GDA), and the <TT>LGx</TT> instructions are used. 
  <P></P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -