首页 › 资源下载 › 编译器/解释器 › 有用 › 源码查看
tutorial.html

来自「有用」· HTML 代码 · 共 1,071 行 · 第 1/2 页
HTML
1,071 行
<HTML><HEAD><TITLE>A Short Tutorial on the ACCENT Compiler Compiler</TITLE></HEAD><BODY bgcolor="white"><TABLE cellspacing=20>   <TR>      <TD valign="top">         <img src="logo.gif">      </TD>      <TD valign="bottom" align="left">         <a href="index.html">The Accent Compiler Compiler</a>         <h1>A Short Tutorial</h1>      </TD>   </TR>   <TR>      <TD align="right" valign="top">         <!-- MENU -->         <font face="helvetica">         <a href="index.html">Accent</a><br>         <a href="overview.html">Overview</a><br>         Tutorial<br>         <a href="language.html">Language</a><br>         <a href="installation.html">Installation</a><br>         <a href="usage.html">Usage</a><br>         <a href="lex.html">Lex</a><br>         <a href="algorithms.html">Algorithms</a><br>         <a href="distribution.html">Distribution</a><br>         </font>      </TD>      <TD valign="top">      <!--- begin main content -->The input of a computer program is often written in a specific language.This holds for traditional compilers but it is also true for many otherapplication programs. For example, a program that displays moleculesreads input written in a molecule language.<p>Accent is a tool that supports the development of language processors.It generates programs that process source text writtenin the language specified by the user and make the underlying structureof the text explicit. Following this structure semantic action linked tocertain constructs are executed.<ul><li><a href="#describe">How to Describe Languages</a><li><a href="#assign">How to Assign Meaning</a><li><a href="#abbreviate">How to Abbreviate Specifications</a><li><a href="#resolve">How to Resolve Ambiguities</a></ul><h1><a name="describe">How to Describe Languages</a></h1><h3>Grammars</h3>The user describes the language by providing a grammar.Such a grammar is given by rules.<p>A rule describes how to build a particular construct of the language.This is done by listing one or more alternativeshow to build the construct from constituents.<p>For example, if we define a programming language, we can express thefact that a <tt>program</tt> is constructed from a <tt>declaration_part</tt>and a <tt>statement_part</tt>by the rule<pre>   program : declaration_part statement_part ;</pre>A rule has a left hand side that names the construct defined by the rule.A colon  ("<tt>:</tt>") separates the left hand side from the right hand side.The right hand side specifies how to build the construct.A semicolon ("<tt>;</tt>") terminates the rule.<p>The name on the left hand side is called a nonterminal.A nonterminal may be used in right hand sides to specify constituents.The possible representations defined by a nonterminial are calledthe phrases of the nonterminal.<p>The rule above says that phrases for <tt>program</tt> arecomposed from a phrase for <tt>declaration_part</tt> followed by a phrasefor <tt>statement_part</tt>.<p>A rule may provide more than one alternative.Here is a specification of the nonterminal <tt>statement</tt>:<pre>   statement :      variable '=' expression   |  IF expression THEN statement ELSE statement   |  WHILE expression DO statement   |  BEGIN statement_seq END   ;</pre>It describes four alternatives how to construct a statement.Alternatives are separated by a bar ("<tt>|</tt>").<p>The nonterminal that is defined by the first rule of the grammaris called the start symbol.The phrases of the start symbol constitute the language defined by the grammar.<p>Grammars of this kind are called context-free grammars.Accent can process all context-free grammars without restriction.<h3>Lexical Elements</h3>Nonterminal symbols are defined by rules.There are also elementary itemscalled terminal symbols or tokens.<p>They may be given literally, if they are represented by a single character.For example, the <tt>'='</tt> element in the first alternative for <tt>statement</tt> stands for the character "<tt>=</tt>".<p>They may also be referred by a symbolic name such as <tt>IF</tt>in the second alternative for <tt>statement</tt>.In our language an <tt>IF</tt> symbol could be represented bythe two character string "<tt>if</tt>".Such a mapping from strings to tokens is not definedby the Accent specification.In Accent, one just introduces symbolic names that are used for terminalsymbols.<p>For example,<pre>   %token IF, THEN, ELSE, WHILE, DO, BEGIN, END;</pre>introduces names for the tokens in the rule for <tt>statement</tt>.<p>The declaration precedes the first rule of the grammar.<p>The actual representation is given by rules for a further tool:The generator Lex can be used to create a lexical analyzerthat partitions the input text into tokens. A specification for Lexis a list of lexical rules.<p>A lexical rule states a pattern (a regular expression) that matches the tokenand an action that is carried out when the token is recognized.Here is are two example rules:<pre>   "="  { return '='; }   "if" { return IF;  }</pre>If "<tt>=</tt>" is recognized then the code of the character <tt>'='</tt>is returned as an indicator.If "<tt>if</tt>" is recognized then the value of the constant <tt>IF</tt>is returned.<p>A token may have a more complex structure. An example is the token<tt>NUMBER</tt> which represents a sequence of digits.This can be specified by a Lex rule like this:<pre>[0-9]+ { return NUMBER; }</pre>A string that matches the pattern <tt>[0-9]+</tt>(i.e. a sequence of digits) is indicated as a <tt>NUMBER</tt>.<p>The lexical analyzer (described in the Lex specification)structures the input into a sequence of tokens.The syntactical analyzer (described in the Accent specification)hierarchically structures this sequence into phrases.<p>An item is specified as token if it has a fixed representationsuch as "<tt>=</tt>" or "<tt>if</tt>"or if it can be defined by a simple expression such as the patternfor <tt>NUMBER</tt>.In many cases tokens are those items that can be separatedby additional white space.A Lex rule can specify to skip white space so that it can be ignoredin the syntactical grammar.<p>The Lex rule<pre>   " " { /* skip blank */ }</pre>skips blanks by not returning a token indicator.<h3>A Grammar for Expressions</h3>We now present a simple but complete example:a grammar for expressions like<pre>   10+20*30</pre>Here is the Accent specification:<pre>   01  %token NUMBER;   02     03  expression :   04    term   05  ;   06     07  term :   08    term '+' factor   10  | term '-' factor   11  | factor   12  ;   13     14  factor :   15    factor '*' primary   16  | factor '/' primary   17  | primary   18  ;   19     20  primary :   21    NUMBER   22  | '(' term ')'   23  | '-' primary   24  ;</pre>These rules not only define what constitutes a valid expressionbut also give it structure.The different nonterminals reflect the binding strength of the operators.The operators of a factor ("<tt>*</tt>" and "<tt>/</tt>")have a stronger binding than theoperators of a term ("<tt>+</tt>" and "<tt>-</tt>")because <tt>factor</tt> is a constituent of a <tt>term</tt>(a <tt>term</tt> appearing inside a <tt>factor</tt> must be enclosed inparentheses).<p>For example, the input<pre>   10+20*30</pre>is structured as follows:<pre>   expression   |   +-term     |     +-term     | |     | +- factor     |    |     |    +-primary     |      |     |      +-NUMBER     |     +-'+'     |     +-factor       |       +-factor       | |       | +-primary       |   |       |   +-NUMBER       |       +-'*'       |       +-primary	 |         +-NUMBER</pre>or more precisely(this representation, generated with Accent, specifies which alternativehas been chosen and lists in curly braces the constituents):<pre>   expression alternative at line 4, col 6 of grammar {     term alternative at line 8, col 6 of grammar {       term alternative at line 10, col 8 of grammar {         factor alternative at line 16, col 9 of grammar {           primary alternative at line 20, col 8 of grammar {             NUMBER           }         }       }       '+'       factor alternative at line 14, col 8 of grammar {         factor alternative at line 16, col 9 of grammar {           primary alternative at line 20, col 8 of grammar {             NUMBER           }         }         '*'         primary alternative at line 20, col 8 of grammar {           NUMBER         }       }     }   }</pre>The tree above indicates that<pre>   10+20*30</pre>is structured as<pre>   10+(20*30)</pre>and not as<pre>   (10+20)*30</pre>Here is the Lex specification for the expression grammar:<pre>   %{   #include "yygrammar.h"   %}   %%   "+"    { return '+'; }   "-"    { return '-'; }   "*"    { return '*'; }   "/"    { return '/'; }   [0-9]+ { return NUMBER; }   " "    { /* skip blank */ }   \n     { /* skip newline */ }   .      { yyerror("illegal token"); }</pre>(The file <tt>yygrammar.h</tt>, which is included in the header of theLex specification, is generated by Accent and contains the definition ofthe constant <tt>NUMBER</tt>.)<h1><a name="assign">How to Assign Meaning</a></h1><h3>Semantic Actions</h3>From the above grammar Accent generates a program that analyzes itsinput syntactically: it rejects all texts that do not conform to the grammar.<p>In order to process the input semantically we haveto specify semantic actions.These actions may be embedded into the grammar at arbitrary positions.They are executed when the particular alternative is processed.The members of a selected alternative are processed from left to right.<p>A semantic action is arbitrary C code inclosed in curly braces.The text (without the braces) is copied verbatim into the generatedprogram.<p>Here is an example<pre>   N:      { printf("1\n"); } A { printf("2\n"); } B { printf("3\n"); }   |  { printf("x\n"); } C { printf("y\n"); }   ;   A: 'a' { printf("inside A\n"}; };   B: 'b' { printf("inside B\n"}; };   C: 'c' { printf("inside C\n"}; };</pre>For the input<pre>   a b</pre>the generated program produces the output<pre>   1   inside A   2   inside B   3</pre>For each nonterminal Accent generates a tree walker function.Here is the code generated for <tt>N</tt>(slightly edited and without <tt>#line</tt> pragmas for C preprocessor):<pre>   N ()   {      switch(yyselect()) {      case 1:	 {	    printf("1\n"); 	    A();	    printf("2\n"); 	    B();	    printf("3\n"); 	 }	 break;      case 2:	 {	    printf("x\n"); 	    C();	    printf("y\n"); 	 }	 break;      }   }</pre><h3>Attributes of Nonterminals</h3>Like functions in C, nonterminal can have parameters.Parameters may be of mode <tt>in</tt> or <tt>out</tt>.<tt>in</tt> parameters are used to pass information from the contextto a particular nonterminal (often called inherited attributes).<tt>out</tt> parameters pass information from a nonterminal to its context(often called synthesized attributes).At the left hand side of rule the name ofthe nonterminal is followed by a signature that specifies mode, type,and name of parameters.The signature is enclosed in the braces"<tt>&lt;</tt>" and "<tt>&gt;</tt>".<p>For example<pre>   N < %in int context, %out int result > : ... ;</pre><tt>N</tt> has an input parameter <tt>context</tt> and an output parameter<tt>result</tt>, both are of type <tt>int</tt>.<p>If a nonterminal appears on the right hand side, actual parametersfollow the nonterminal name, enclosed in"<tt>&lt;</tt>" and "<tt>&gt;</tt>".<p>For example<pre>   N&lt;actual_context, actual_result&gt;</pre>Parameters can be accessed inside semantic actions.The values of input parameters must be defined inside semantic actionsor be the output of other members.<p>For example<pre>   demo :      { actual_context = 10; }      N&lt;actual_context, actual_result&gt;      { printf("%d\n", actual_result); }   ;</pre>An alternative for a nonterminal must define its output parameters,either by using them as output parameters for  membersor by assigning a value inside a semantic action.If an output parameter of the left hand side (a formal output parameter)is used inside a semantic action, it must be dereferenced with the "<tt>*</tt>"operator (output parameters are passed by reference to the generatedtree walker function).<p>For example<pre>   N<%in int context, %out int result> : { *result = context+1; } ;</pre>An elaboration of <tt>demo</tt> prints <tt>11</tt>.<p>Here are the generated functions:<pre>demo (){   int actual_context;   int actual_result;   switch(yyselect()) {   case 1:      {	 actual_context = 10; 	 N(actual_context, &actual_result);	 printf("%d\n", actual_result);       }      break;   }}N (context, result)   int context;   int *result;{   switch(yyselect()) {   case 2:      {         *result = context+1;       }      break;   }}</pre>As you see, identifiers that appear as parameters of nonterminalsare automatically declared.<p>Formal parameters, if present, are specified in the form<pre>   < %in parameter_specifications %out parameter_specifications ></pre>where either the <tt>%in</tt> group or the <tt>%out</tt> group may be omitted.<p>The most frequent case, where we have only output parameters,can simply be written without a mode indicator:<pre>   < parameter_specifications ></pre><tt>parameter_specifications</tt> is a list of the form<pre>   Type_1 Name_1 , ... , Type_n Name_n</pre>The type may be omitted.In this case the special type <tt>YYSTYPE</tt> is assumed.This may be defined by the user as a macro.If there is no user specific definition <tt>YYSTYPE</tt>stands for <tt>long</tt>.<p>Hence in most cases a left hand side of a rule simply looks like this:<pre>   Block&lt;b&gt; : ... </pre><h3>Attributes of Tokens</h3>All items declared as tokens have an output parameter of type<tt>YYSTYPE</tt>. If a token is used on the right hand side of a rule,an actual parameter may be specified to access the attribute value ofthe token which is computed by the scanner.<p>For example<pre>    Value :       NUMBER&lt;n&gt; { printf("%d\n", n); }    ;  </pre>Here <tt>n</tt> represents the numeric value of <tt>NUMBER</tt>.It can be used in the semantic action.<p>The attribute value of a token must be computed in the semantic actionof the Lex rule for the token. It must be assigned to the specialvariable <tt>yylval</tt> which is of type <tt>YYSTYPE</tt>.<p>For example if we want to access the value of a <tt>NUMBER</tt>the corresponding Lex could be<pre>   [0-9]+ { yylval = atoi(yytext); return NUMBER; }</pre>The special variable <tt>yytext</tt> holds the string that matched thepattern. The C function <tt>atoi</tt> converts it into a integer.<h3>Global Prelude</h3>If <tt>YYSTYPE</tt> is defined by the user,it should be declared in an <tt>include</tt> file,
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?
tutorial.html

tutorial.html - 源码说明

⌨️ 快捷键说明