📄 context.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0047)http://cs.wwc.edu/~aabyan/464/Book/Context.html -->
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1458" name=GENERATOR></HEAD>
<BODY text=#000000 vLink=#51188e aLink=#ff0000 link=#0000ef bgColor=#ffffff>
<HR width="100%">
<CENTER>
<H1>Symbol Tables</H1></CENTER>
<HR width="100%">
<BR>Lex and Yacc files can be extended to handle the context sensitive
information. For example, suppose we want to require that, in Simple, we require
that variables be declared before they are referenced. Therefore the parser must
be able to compare variable references with the variable declarations.
<P>One way to accomplish this is to construct a list of the variables during the
parse of the declaration section and then check variable references against the
those on the list. Such a list is called a <I>symbol table</I>. Symbol tables
may be implemented using lists, trees, and hash-tables.
<P>We modify the Lex file to assign the global variable <TT>yylval</TT> to the
identifier string since the information will be needed by the attribute grammar.
<H2>The Symbol Table Module</H2>To hold the information required by the
attribute grammar we construct a symbol table. A symbol table contains the
environmental information concerning the attributes of various programming
language constructs. In particular, the type and scope information for each
variable. The symbol table will be developed as a module to be included in the
yacc/bison file.
<P>The symbol table for Simple consists of a linked list of identifiers,
initially empty. Here is the declaration of a node, initialization of the list
to empty, <BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><TT>struct symrec<BR>{<BR> char
*name;
/* name of symbol
*/<BR> struct symrec *next; /* link
field
*/<BR>};<BR>typedef struct symrec symrec;<BR>symrec *sym_table = (symrec
*)0;<BR>symrec *putsym ();<BR>symrec *getsym
();</TT></TD></TR></TBODY></TABLE></CENTER>
<P>and two operations: putsym to put an identifier into the table, and
<TT>getsym</TT> which returns a pointer to the symbol table entry corresponding
to an identifier. <BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><PRE><TT>symrec * putsym ( char *sym_name )
{
symrec *ptr;
ptr = (symrec *) malloc (sizeof(symrec));
ptr$->$name = (char *) malloc (strlen(sym_name)+1);
strcpy (ptr$->$name,sym_name);
ptr$->$next = (struct symrec *)sym_table;
sym_table = ptr;
return ptr;
}</TT></PRE><TT>symrec * getsym ( char *sym_name )<BR>{<BR> symrec
*ptr;<BR> for (ptr = sym_table; ptr != (symrec *)
0;<BR> ptr = (symrec
*)ptr$->$next)<BR> if (strcmp
(ptr$->$name,sym_name) == 0)<BR> return
ptr;<BR> return 0;<BR>}</TT></TD></TR></TBODY></TABLE></CENTER>
<H2>The Parser Modifications</H2>The Yacc/Bison file is modified to include the
symbol table and with routines to perform the installation of an indentifier in
the symbol table and to perform context checking. <BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><TT>%{<BR>#include
$<$stdlib.h$>$ /* For malloc in
symbol table */<BR>#include
$<$string.h$>$ /* For strcmp in
symbol table */<BR>#include
$<$stdio.h$>$ /* For error
messages */<BR>#include
"ST.h" /* The
Symbol Table Module */<BR>#define YYDEBUG
1 /* For
debugging
*/<BR>install ( char *sym_name )<BR>{ symrec *s;<BR> s =
getsym (sym_name);<BR> if (s ==
0)<BR> s = putsym
(sym_name);<BR> else {
errors++;<BR>
printf( "%s is already defined\n", sym_name );<BR>
}<BR>}<BR>context_check( char *sym_name )<BR>{ if ( getsym( sym_name ) ==
0 ) <BR> printf( "%s is an undeclared
identifier\n", sym_name );<BR>}<BR>%}<BR></TT><I>Parser
declarations<BR></I><TT>%%<BR></TT><I>Grammar rules and
actions<BR></I><TT>%%<BR></TT><I>C
subroutines</I></TD></TR></TBODY></TABLE></CENTER>
<P>Since the scanner (the Lex file) will be returning identifiers, a semantic
record (static semantics) is required to hold the value and <TT>IDENT</TT>
is associated with that semantic record. <BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><I>C declarations<BR></I><TT>%union
{
/* SEMANTIC
RECORD
*/<BR>char
*id;
/* For returning identifiers */<BR>}<BR>%token INT SKIP IF THEN ELSE
FI WHILE DO END <BR>%token <id> IDENT /* Simple
identifier */<BR>%left '-' '+'<BR>%left '*'
'/'<BR>%right '^' <BR>%%<BR></TT><I>Grammar rules and
actions<BR></I><TT>%%<BR></TT><I>C
subroutines</I></TD></TR></TBODY></TABLE></CENTER>The context free-grammar is
modified to include calls to the install and context checking functions.
<TT>$<I>n</I></TT> is a variable internal to Yacc which refers to the semantic
record corresponding the the n-th symbol on the right hand side of a production.
<BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><I>C and parser
declarations<BR></I><TT>%%<BR>...<BR>declarations : /* empty
*/<BR> | INTEGER id_seq IDENTIFIER '.' { install( $3 );
}<BR>;<BR>id_seq : /* empty */<BR> | id_seq IDENTIFIER
',' { install( $2 ); }<BR>;<BR>command :
SKIP<BR> | READ IDENTIFIER { context_check( $2 );
}<BR> | IDENT ASSGNOP exp { context_check( $2 );
}<BR>...<BR>exp :
INT <BR>
|
IDENT
{ context_check( $1 ); } <BR>...<BR>%%<BR></TT><I>C
subroutines</I></TD></TR></TBODY></TABLE></CENTER>
<P>In this implementation the parse tree is implicitly annotated with the
information concerning whether a variable is assigned to a value before it is
referenced in an expression. The annotations to the parse tree are collected
into the symbol table.
<H2>The Scanner Modifications</H2>The scanner must be modified to return the
literal string associated each identifier (the semantic value of the token). The
semantic value is returned in the global variable <TT>yylval</TT>.
<TT>yylval</TT>'s type is a union made from the <TT>%union</TT> declaration in
the parser file. The semantic value must be stored in the proper member of the
union. Since the union declaration is:
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE>%union { char *id; <BR>}</TD></TR></TBODY></TABLE></CENTER>
<P>The semantic value is copied from the global variable <TT>yytext</TT> (which
contains the input text) to <TT>yylval.id</TT>. Since the function
<TT>strdup</TT> is used (from the library <TT>string.h</TT>) the library must be
included. The resulting scanner file is: <BR>
<CENTER>
<TABLE border=0 NOSAVE>
<TBODY>
<TR NOSAVE>
<TD NOSAVE><TT>%{<BR>#include <string.h> /*
for
strdup
*/ <BR>#include "Simple.tab.h" /* for token definitions
and yylval */<BR>%}<BR>DIGIT
[0-9]<BR>ID
[a-z][a-z0-9]*<BR>%%<BR>":=" {
return(ASSGNOP); }<BR>{DIGIT}+ {
return(NUMBER);
}<BR>do {
return(DO);
}<BR>else {
return(ELSE);
}<BR>end {
return(END);
}<BR>fi {
return(FI);
}<BR>if {
return(IF);
}<BR>in {
return(IN); }<BR>integer {
return(INTEGER); }<BR>let {
return(LET);
}<BR>read {
return(READ);
}<BR>skip {
return(SKIP);
}<BR>then {
return(THEN); }<BR>while {
return(WHILE); }<BR>write {
return(WRITE); }<BR>{ID} {
yylval.id = (char *)
strdup(yytext); <BR>
return(IDENTIFIER);}<BR>[ \t\n]+ /* eat up whitespace
*/<BR>. {
return(yytext[0]);}<BR>%%</TT></TD></TR></TBODY></TABLE></CENTER>
<H2>Intermediate Representation</H2>Most compilers convert the source code to an
intermedate representation during this phase. In this example, the intermediate
representation is a parse tree. The parse tree is held in the stack but it could
be made explicit. Other popular choices for intermediate representation include
abstract parse trees, three-address code, also known as quadruples, and post fix
code. In this example we have chosen to bypass the generation of an intermediate
representation and go directly to code generation. The principles illustrated in
the section on code generation also apply to the generation of intermediate
code. </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -