首页 › 资源下载 › 编译器/解释器 › 有用 › 源码查看
tutorial.html

来自「有用」· HTML 代码 · 共 1,071 行 · 第 1/2 页
HTML
1,071 行
because it is used in the grammar as well as in the Lex specification.<p><tt>#include</tt> statements can be placed in the global prelude partat the beginning of the grammar file.Text which is enclosed by<pre>   %prelude {</pre>and<pre>   }</pre>is copied verbatim into the generated program.<p>For example<pre>   %prelude {   #include "yystype.h"   }</pre><h3>Rule Prelude</h3>Identifiers that are used as attributes need not be declared.One may use semantic actions to declare additional variables(the curly braces surrounding a semantic do not appear in the generated code).<p>For example<pre>   demo :      {int i = 0;} alternative_1 ;   |  alternative_2   ;</pre>Such variables are local to the alternative.<tt>i</tt> is visible in the sematic action of <tt>alternative_1</tt>but not in those of <tt>alternative_2</tt><p>Variables that are visible to all alternatives (but local to the rule)can be declare in rule preludewhich has the same form as the global prelude but appears beforethe alternative list of a rule.<p>For example<pre>   demo :      %prelude {int i = 0;}      alternative_1   |  alternative_2   ;</pre><tt>i</tt> is visible in the sematic actions of both alternatives.<p>The rule prelude can also be used to provide code that should be executeas initialization for all alternatives.<h3>A Calculator</h3>We are now ready to turn the expression grammar into a calculator.<p>For this purpose nonterminals get an output parameter that holds thenumerical value of the phrase represented by the nonterminal.This value is compute from the numerical values of the constituents.<p>For example, in the left hand side<pre>   term&lt;n&gt; :</pre><tt>term</tt> gets an attribute <tt>n</tt>.<p>In the right hand side<pre>     term&lt;x&gt; '+' factor&lt;y&gt; { *n = x+y; }</pre>the nonterminal <tt>term</tt> gets an attribute <tt>x</tt>and the nonterminal <tt>factor</tt> gets an attribute <tt>y</tt>.The attribute of the left hand side is the computed as the sumof <tt>x</tt> and <tt>y</tt>.<p>Here is the complete grammar:<pre>   %token NUMBER;   expression :     term&lt;n&gt; { printf("%d\n", n); }   ;   term&lt;n&gt; :     term&lt;x&gt; '+' factor&lt;y&gt; { *n = x+y; }   | term&lt;x&gt; '-' factor&lt;y&gt; { *n = x-y; }   | factor&lt;n&gt;   ;   factor&lt;n&gt; :     factor&lt;x&gt; '*' primary&lt;y&gt; { *n = x*y; }   | factor&lt;x&gt; '/' primary&lt;y&gt; { *n = x/y; }   | primary&lt;n&gt;   ;   primary&lt;n&gt; :     NUMBER&lt;n&gt;   | '(' term&lt;n&gt; ')'   | '-' primary&lt;x&gt; { *n = -x; }   ;</pre>See<a href="usage.html"><i>Using the Accent Compiler Compiler</i></a>how to process this specification to obtain a functioning calculator.<h1><a name="abbreviate">How to Abbreviate Specifications</a></h1><h3>Extended Backus Naur Form</h3>So far we have only considered grammars where members of alternativesare nonterminal and terminal symbols. A formalism of this kindwas used in the Algol 60 report and named after the editorsof that document: Backus Naur Form.<p>Accent also supports a notation that is known as Extended Backus Naur Form.In this formalism one can write structured members to specifylocal alternatives and optional and repetitive elements.<h3>Local Alternatives</h3>A member of the form<pre>   ( alt_1 | ... | alt_n )</pre>can be used to specify alternative representations of a memberwithout introducing a new nonterminal.<p>For example, instead of<pre>   signed_number :      sign NUMBER   ;   sign :      '+'   |  '-'   ;</pre>one can write<pre>   signed_number :      ( '+' | '-' ) NUMBER   ;</pre>Semantic actions may be inserted.The actions of the selected alternative are executed.<p>For example,<pre>   signed_number&lt;r&gt; :      { int s; }      ( '+' { s = +1; } | '-' { s = -1; } ) NUMBER&lt;n&gt;      { *r = s*n; }   ;</pre><h3>Optional Elements</h3>A member can also have the form<pre>   ( M_1 ... M_n )?</pre>in which case the enclosed items <tt>M_1</tt>,  ... , <tt>M_n</tt>may appear in the input or not.<p>For example,<pre>   integer :      ( sign )? NUMBER   ;</pre>specifies specifies that <tt>integer</tt> is a <tt>NUMBER</tt>preceded by an optional <tt>sign</tt>.<p>So both<pre>   123</pre>and<pre>   + 123</pre>are valid phrases for <tt>integer</tt>.<p>More than one alternative may be specified between "<tt>(</tt>" and "<tt>)?</tt>":<pre>   ( alt_1 | ... | alt_n )?</pre>For example,<pre>	    integer :       ( '+' | '-' )? NUMBER    ;</pre>specifies that an <tt>integer</tt> is a <tt>NUMBER</tt>that is optionally preceded by either a "<tt>+</tt>" or a "<tt>-</tt>".<p>In case of semantic actions, proper initialization is required,because none of the alternative may be processed:<pre>   integer&lt;r&gt; :      { int s = +1; }      ( '+' | '-' { s = -1; } )? NUMBER&lt;n&gt;      { *r = s*n; }   ;</pre><h3>Repetitive Elements</h3>A further form of a member is<pre>   ( M_1 ... M_n )*</pre>in which case the enclosed items <tt>M_1</tt>,  ... , <tt>M_n</tt>may be repeated an arbitrary number of times (including zero).<p>For example,<pre>   number_list :      NUMBER  ( ',' NUMBER )*   ;</pre>specifies that a <tt>number_list</tt> is given by at least one <tt>NUMBER</tt>which is followed by arbitrary number of comma-separated <tt>NUMBER</tt>s.<p>Semantic action inside repetions are executed as often as there are instances.<p>For example,<pre>   number_list :      NUMBER&lt;sum&gt;  ( ',' NUMBER&lt;next&gt; { sum += next;} )*      { printf("%d\n", sum); }   ;</pre>adds all the numbers and prints their sum.<p>Again, several alternatives may specified:<pre>   ( alt_1 | ... | alt_n )*</pre>For example,<pre>   statements :      ( simple_statement | structured_statement )*   ;</pre><tt>statements</tt> matches an arbitrary number of statements,each of which may be a <tt>simple_statement</tt> or a<tt>structured_statement</tt>.<h1><a name="resolve">How to Resolve Ambiguities</a></h1><h3>Ambiguities</h3>The phrase structure of a source text determines the actions that areexecuted. There are grammars, where the same source textcan have different phrase structures.Such grammars are called ambiguous.<p>As an example consider this rule from the C language report:<pre>   selection_statement :     IF '(' expression ')' statement   | IF '(' expression ')' statement ELSE statement   | SWITCH '(' expression ')' statement   ;</pre>For the source text<pre>   if (x) if (y) f(); else g();</pre>two interpretations are possible.One treats the text like<pre>   if (x) { if (y) f(); else g(); }</pre>the other one like<pre>   if (x) { if (y) f(); } else g();</pre>In the first case <tt>g()</tt> is invoked if<tt>x</tt> is true and <tt>y</tt> is false.In the second case <tt>g()</tt> is invoked if<tt>x</tt> is false.<p>The intended interpretation could be specified by an unambiguous grammar(as has been done in the case of Java). But such a grammar would be moreclumsy because the problem can not be solved locally.The C report uses an ambiguous grammar and states:"<i>The <tt>else</tt> ambiguity is resolved by connecting an <tt>else</tt>with the last-encountered <tt>else</tt>-less <tt>if</tt>at the same block nesting level.</i>"<p>In traditional systems ambiguous grammars lead to violations ofthe constraints of the underlying grammar classes (all LL(1) and all LALR(1)grammars are unambiguous). The result is a conflict between parser actions(such as a "shift/reduce" conflict in Yacc,these conflicts are also possible if the grammar is unambiguous).Such systems allow the user to resolve the parser conflictsor take a default action.<p>Since Accent has been designed so that it can be used without knowledge ofparser implementation, we have developed an annotation framework that allowsthe user to resolve ambiguities at the abstract level of the grammar.Moreover, this framework is complete, it allows to resolve every ambiguity.<p>For this purpose we have to give a complete classification of ambiguities.We distinguish ambiguities between alternativesand ambiguities inside alternatives.<p>It is undecidable whether a given grammar is unambiguous.But an ambiguity can be detected at runtime.In this case the Accent Runtime prints a detailed analysisand gives an indication how to resolve the ambiguity via an annotation.If the annotated grammar is processed by Accent, the parser selectsthe phrase structure that has been specified by the user.<h3>Ambiguities Between Alternatives</h3>Here is a simple example for an ambiguity between alternatives:<pre>   01  Start:   02   'x' N 'z'   03  ;   04   05  N:   06    A { printf("A selected\n"); }   07  | B { printf("B selected\n"); }   08  ;   09   10  A: 'y' ;   11  B: 'y' ;</pre>For the input<pre>   x y z</pre>there are two possible derivations for <tt>N</tt> since both,<tt>A</tt> and <tt>B</tt>, can produce a "<tt>y</tt>".Hence there are two possible outputs for the same input:<pre>   A selected</pre>and<pre>   B selected</pre>When confronted with that input the parser emits the following diagnostics:<pre>   GRAMMAR DEBUG INFORMATION   Grammar ambiguity detected.   Two different ``N'' derivation trees for the same phrase.   TREE 1   ------   N alternative at line 6, col 3 of grammar {     A alternative at line 10, col 4 of grammar {       'y'     }   }   TREE 2   ------   N alternative at line 7, col 3 of grammar {     B alternative at line 11, col 4 of grammar {       'y'     }   }   Use %prio annotation to select an alternative.   END OF GRAMMAR DEBUG INFORMATION</pre>A "<tt>%prio</tt>" annotation can be used to give an alternativea certain priority.It is written in the form<pre>   %prio number</pre>and attached at the end of an alternative. The alternative then gets thepriority <tt>number</tt>.<p>When two different rules  can produce the same string, both must havea "<tt>%prio</tt>" annotation. The alternative with the higher priorityis selected.<p>To select the second alternative for <tt>N</tt> we give it thepriority <tt>2</tt> and assign <tt>1</tt> to the first alternative.<pre>   N:     A { printf("A selected\n"); } %prio 1   | B { printf("B selected\n"); } %prio 2   ;</pre>Now the ambiguity is resolved and the output of the generated program is:<pre>   B selected</pre>In the same style we can resolve the <tt>else</tt> ambiguity:<pre>   selection_statement :     IF '(' expression ')' statement %prio 1   | IF '(' expression ')' statement ELSE statement %prio 2   | SWITCH '(' expression ')' statement   ;</pre>It is not necessary to provide a priority for the third alternative becauseit is not involved in the ambiguity.<h3>Ambiguities Inside Alternatives</h3>We have seen that a derivation is not unique if one can replace a subtreeby a different one that produces the same string.But there is also another source of ambiguity that is less obvious.<p>In this kind of ambiguity inside the same alternativetwo neighbor derivations are replaced by different derivationsthat produce different strings.But together the new derivations produce the same string than the old ones.<p>It is clear that the string produced by a new derivation must havea different length than the string produced by an old derivation.The ambiguity can be resolved by specifying whetherthe short or the long string should be preferred.<p>Here is an example:<pre>   01  N :   02    'x' L R 'z'   03  ;   04     05  L :   06    'a'     { printf("( a )");   }   07  | 'a' 'b' { printf("( a b )"); }   08  ;   09     10  R :   11    'c'     { printf("( c )\n");   }   12  | 'b' 'c' { printf("( b c )\n"); }   13  ;</pre>To parse the input<pre>   x a b c z</pre>one can either use the first alternative of <tt>L</tt> and the secondalternative of <tt>R</tt> or the second alternative of <tt>L</tt>and the first alternative of <tt>R</tt>.<p>The diagnostic message produced by the parser is:<pre>   GRAMMAR DEBUG INFORMATION   Grammar ambiguity detected.   There are two different parses   for the beginning of ``N'',   alternative at line 2, col 3 of grammar,   up to and containing ``R'' at line 2, col 9 of grammar.   PARSE 1   -------   'x'   L alternative at line 6, col 3 of grammar {     'a'   }   R alternative at line 12, col 3 of grammar {     'b'     'c'   }   PARSE 2   -------   'x'   L alternative at line 7, col 3 of grammar {     'a'     'b'   }   R alternative at line 11, col 3 of grammar {     'c'   }   For ``R'' at line 2, col 9 of grammar,   use %long annotation to select first parse,   use %short annotation to select second parse.   END OF GRAMMAR DEBUG INFORMATION</pre>The "<tt>%long</tt>" annotation in front of a nonterminalindicates that we prefer that the nonterminal produces a longer string.<p>If we prefix the nonterminal symbol <tt>R</tt> with <tt>%long</tt> as in<pre>   N :     'x' L %long R 'z'   ;</pre>the ambiguity is resolved and we get the output<pre>   ( a )( b c )</pre>The "<tt>%short</tt>" annotation indicates that we prefer the shorter string.If we prefix <tt>R</tt> with <tt>%short</tt> as in<pre>   N :     'x' L %short R 'z'   ;</pre>we get the output<pre>   ( a b )( c )</pre>      <!--- end main content -->      <br>      <br>      <font face="helvetica" size="1">      <a href="http://accent.compilertools.net">accent.compilertools.net</a>      </font>      </TD>   </TR></TABLE></BODY></HTML>
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?
tutorial.html

tutorial.html - 源码说明

⌨️ 快捷键说明