📄 chapter 6 assignment and expressions.htm
字号:
var
y: card;
z: int;
begin
y:= 10;
// ^-- Make this 10 be a card32
z:= 10;
// ^-- Make this 10 be an int32
end program.
</PRE>Here we have two variables. One is a <TT>card</TT> and the other is an
<TT>int</TT>. The problem comes up when we are trying to load the value of 10
into either one. In one instance we want 10 to be a <TT>card32</TT> (the default
size for a <TT>card</TT>), and in another instance we are making it be an
<TT>int32</TT> (the default size for an <TT>int</TT>).
<P>This problem is solved by having rule Expression take a "hint" type that it
passes on to each rule until rule Factor is called. Rule Factor then takes this
hint (which is contained in RType) and use that to set the constant's type. So,
in the case of variable <TT>y</TT> in the previous example,
<TT>RuleAssignment()</TT> calls <TT>RuleIdentExpr()</TT> and gets the type for
the variable <TT>y</TT>. This type is a (<TT>cardtyp</TT>, <TT>_u32s</TT>).
Next, <TT>RuleExpression()</TT> is called, and the type for <TT>y</TT> is passed
in. <TT>RuleExpression()</TT> calls other rules, which in turn eventually call
<TT>RuleFactor()</TT>, which will be given the type of <TT>y</TT>.
<TT>RuleFactor()</TT> can then use this information to make sure that the number
10 is emitted in the proper size: a dword. It ends up looking like this:
<P><PRE> LID 0000000A ; The number 10 in 32-bit hex
</PRE>When we get into <TT>RuleFactor()</TT>, we must check to see if the value
of the constant fits the size that was indicated. For instance, if we have a
variable that is an <TT>int16</TT>, we could not assign a literal value of
50,000 to it. It just wouldn't fit. However, this does not always constitute an
error. Consider the statement, <PRE> write 9123456789;
</PRE>The number 9123456789 is obviously too big for an <TT>int32</TT>, but that
does not matter. We only check to see if the number can be expressed within the
boundaries of the type that <TT>RType</TT> is set to and ignore it if the
constant fits within that type, or pick the next biggest type that the constant
will fit into. If the constant will not fit into that type, then we at least try
to fit into the same domain. For instance, if the constant is a 50000, and
<TT>int16</TT> was specified, then we do not choose to set <TT>RType</TT> to a
<TT>card16</TT>. Instead, we would use an <TT>int32</TT>. <PRE> Type Minimum Maximum
====================================================================
int8 -128 127
int16 -32,768 32,767
int32 -2,147,483,648 2,147,483,647
int64 -9,223,372,036,854,775,808 9,223,372,036,854,775,807
card8 0 255
card16 0 65,536
card32 0 4,294,967,295
card64 0 18,446,744,073,709,551,615
</PRE>By using this method, we make it the responsibility of the calling rule to
verify that the type that we return is acceptable. This is something that they
should do, anyway.
<P>
<H4>6.3.1.4 Emitting a Real (semi-optional)</H4>We have a similar problem for
real numbers as for integers and cardinals. <B>You should emit 64-bit reals,
32-bit is optional.</B> Again, the source of the problem is that of overlapping
domains. The problem is handled in much the same way. Use the following table to
discern the limits of floating point numbers: <PRE> Type Minimum Maximum
====================================================================
real32 +|- 3.40282347E+38 +|- 1.40239846E-45
real64 +|- 1.79769313486231570E+308 +|- 4.94065645841246544E-324
</PRE>This is consistent with the IEEE 754 floating-point standard.
<P>
<H3>6.3.2 Emitting code for a literal string constant</H3><!----------------------------------------------------------------------------->When
the string was recognized by the scanner, it should have been stored in a
temporary buffer. The result of a string expression is always an address to the
start of the string. Suppose we have this line: <PRE> write "Hello, world!\n";
</PRE>The parser works its way into rule WriteStatement, where it begins calling
rule Expression over and over again. A string literal is a special case of
expression. Its type is <TT>arraytyp</TT>, its subtype is <TT>_u8</TT>, and its
block pointer is NULL. This is how your compiler is supposed to tell a string
apart from a simple array of characters.
<P>When the compiler recognizes a string, it is always supposed to generate an
address to the start of the string. All strings are stored in the string data
area of the module. We should never have to worry about strings stored in other
modules, since strings are never accessed externally. There are no symbolic
string constants in SAL.
<P>Generating the string's address is done by using the <TT>LSTA</TT>
instruction. It takes a 32-bit immediate parameter for the strings offset in the
SDA. There is a function that packs the string and gets the offset in one fell
swoop, called <TT>PackStringLiteral()</TT>. Here is its prototype: <PRE> DWORD PackStringLiteral(const char *str);
</PRE>As we can see this function takes a pointer to a string, and returns the
offset where the string will be stored at. The rest of the matter is trivial.
The code to be generated looks like this. <PRE> DWORD stroffs = PackStringLiteral(Token.data.str.txt);
Emit (xLSTA, stroffs);
</PRE>The only remaining thing to do is properly fill in the fields of RType.
<P>
<H3>6.3.3 Other Things Factor Does</H3><!----------------------------------------------------------------------------->Factor
calls rule Expression if it encounters parentheses. No code has to be generated
to do this. This sort of thing is taken care of automatically. The other rule
that Factor calls is to IdentExpr. Nothing additional needs to be done for this,
either.
<P><!----------------------------------------------------------------------------->
<H2>6.4 Rule IdentExpr</H2><!----------------------------------------------------------------------------->This
is one of the most important rules in the compiler.
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE23.gif">
<P><FONT face=arial size=-1><B>Figure {RULE23}.</B></FONT> </P></MENU>Based on
the type of identifier returned by <TT>RuleQualIdent()</TT>, we will process a
variable, a constant, a type conversion, a function/procedure call, or an array
or record. The prototype for this function should look like this: <PRE> void IdentExpr ( Set Follow, BOOLEAN MakeRValue, Type &RType )
</PRE>The first parameter is self-explanatory. The second parameter is a
command, telling <TT>RuleIdentExpr()</TT> to make an rvalue or an lvalue. The
third parameter is a reference to a type, which is filled in by this function.
<P>The first thing that we want to do in this function is get the current token
(which we already know to be an identifier), and look it up in the symbol table.
This is done through a call to <TT>QualIdent()</TT>. If <TT>QualIdent()</TT> is
successful, it will return a pointer to a valid <TT>Ident</TT>. At this point,
if the return value is <TT>NULL</TT>, we set <TT>RType</TT> to a no-type value.
This is done by calling <TT>Type::Init()</TT>, and passing in <TT>notyp</TT> and
<TT>nosubtyp</TT>, like so: <PRE> RType.Init(notyp, nosubtyp);
</PRE>We then return. This should effectively be a signal to the caller that the
identifier was not recognized. If <TT>QualIdent()</TT> returns a valid pointer,
we initialize <TT>RType</TT> to the type of the identifier returned. This can
also be done by a call to <TT>Type::Init()</TT>. <PRE> RType.Init(ident);
</PRE>Once we have a pointer to an identifier in the symbol table, there are
several levels at which we can analyze it. The topmost level tells us what the
identifier is, whether it is a constant, a type, a variable, a procedure, or a
function. In rule IdentExpr we split off based on this information. We can use
the <TT>Ident::getObj()</TT> to get the identifier's object type. Procedure and
function calls will be discussed in chapter 8. We will discuss the rest of these
in detail, beginning with symbolic constants.
<P>
<H3>6.4.1 Symbolic Constants</H3><!----------------------------------------------------------------------------->A
symbolic constant is one that is defined in a <TT>const</TT> declaration.
Calling <TT>Ident::getObj()</TT> will return the value <TT>constobj</TT>. Here
is an example of a constant's declaration. <PRE> program AConstant;
const
x = 100;
begin
write x;
end program.
</PRE>All symbolic constants are rvalues; you can't take the address of a
constant, nor can you assign anything to it. For the most part, dealing with a
constant symbolically is the same as dealing with a constant literally. The only
difference is that when a constant is literal we extract its value from a token.
Here our constant is symbolic. We can cast our <TT>Ident</TT> pointer to a
<TT>ConstantIdent</TT> pointer by calling <TT>Ident::toConstantIdent()</TT>.
This will return a pointer that is a <TT>ConstantIdent</TT>. We can then
retrieve a <TT>ConstRec</TT> that has stored the value by calling
<TT>ConstantIdent::getValue()</TT>.
<P>There does remain one issue with type resolution as mentioned in sections
6.3.1.3 and 6.3.1.4, and that is that symbolic constants do have some type,
whereas literal constants do not. Here is an example of two symbolic constants: <PRE> const
c1 = -100;
c2 = 100;
</PRE>At declaration, if a constant has a sign (either a plus or a minus) it
must be an integer. Its type should be set to <TT>inttyp</TT>, and its subtype
should be set to <TT>nosubtyp</TT>. This way we can make sure that signed
symbolic constants are always treated as an integer. If there is no sign, then
the constant's type should be set to <TT>cardtyp</TT>, and the subtype should be
<TT>nosubtyp</TT>.
<P>Setting the type in this manner at declaration time is a form of loose
classification (note that we did not designate a subtype). The value of the
symbolic constant is loosely classified as "signed" (definately an integer), or
"not signed" (possibly an integer, but could also be cardinal). We use the type
of <TT>inttyp</TT> to show that the number needs to be an integer, and we use
the type of <TT>cardtype</TT> to show that the number could be either an integer
or a cardinal.
<P><TT>RuleIdentExpr()</TT> will use the type field for resolving integers' and
cardinals' type by applying the same rules as outlined in section 6.3.1.3 with
the following additions. If the constant's type is an integer, then
<TT>RType.type</TT> must be set to <TT>inttyp</TT>. Perform bounds checking to
make sure that the value of the constant will fit into the specified number of
bits, i.e., make sure that if they specify a certain amount of bits that the
constant will fit. If the constant does not fit, set the subtype to the smallest
that the constant will fit into.
<P>If the constant's type is a <TT>cardtyp</TT> then we utilize the hint value
in <TT>RType</TT> in the same manner as described in section 6.3.1.3. Again, we
want to use the value of <TT>cardtyp</TT> to signify that the type of this
constant is pending. Also, again we check to see that the value will fit into
the specified subtype, otherwise we pick the smallest subtype that our number
will fit into.
<P>All these maneuvers are working around an important issue that compilers have
to face. There is an ambiguity between integers and cardinals because the
domains of the two overlap. Figure {INTDOM} gives a picture of this.
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/INTDOM.gif">
<P><FONT face=arial size=-1><B>Figure {INTDOM}.</B> This figure shows the
overlapping domains of int16 and card16. This introduces an ambiguity in
certain areas of the compiler.</FONT> </P></MENU>When it comes to symbolic
constants, we want to give ourselves as much latitude as possible in order to
allow the language to be flexible.
<H3>6.4.2 Variables</H3><!----------------------------------------------------------------------------->We
know if our identifier is a variable when the call to <TT>Ident::getObj()</TT>
yields a <TT>varobj</TT>. A variable can be either an lvalue or an rvalue. This
does not necessarily complicate things. It just means that we use a bifurcated
method when dealing with any type of variable, including records and arrays (we
will talk about those later). However, as we shall see with functions, it is
also possible to have a variable that is a parameter that is passed by
reference. This is the closest that we will come to dealing with pointers at
this time. A pointer <I>is</I> used for variables that are passed by reference.
<P>
<H4>6.4.2.1 A Variable's Storage Location</H4>Variables can be either passed by
value or passed by reference. Variables that are passed by value can be stored
in one of four different areas. Once an identifier has been found in the symbol
table and determined to be a variable, we can query its properties to find out
where in memory its value will be stored at runtime. A variable can be stored in
an external module, in global memory, in local memory, or in a parent function's
local memory (previous scope). We test the varible in this order to determine
its location:
<OL>
<LI><B>In an external module.</B> If <TT>Ident::getMod()</TT> returns a value
that is not equal to <TT>table.ModNum</TT>, we know that the variable is in a
different module. When a module imports a variable from an external module,
the compiler has to tell the VM to look in the other module's GDA. The
<TT>LGx</TT> instructions perform this task.
<P></P>
<LI><B>In global memory.</B> The method <TT>Ident::getFuncLev()</TT> returns
the scope level of an identifier. If the value returned is zero, then we know
that the variable is global. This means all the global variables for the
current module. These variables are found at some offset from the G register
(the GDA), and the <TT>LGx</TT> instructions are used.
<P></P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -