📄 changes_from_1.33

📁 本工具提供一个词法分析器和语法分析器的集成开发环境
💻 33
📖 第 1 页 / 共 5 页
字号:
      ambiguities you are ok.  The bad news is that you may have spent      time fixing ambiguities that were not real, or used k=2 when      ck=2 might have been sufficient, and so on.      The following grammar demonstrates the problem:        ------------------------------------------------------------            expr          :   ID ;            start         :   stmt SEMI ;            stmt          :   CASE expr COLON                          |   expr SEMI                          |   plain_stmt                          ;            plain_stmt    :   ID COLON ;        ------------------------------------------------------------      When compiled with k=1 and ck=2 it will report:         warning: alts 2 and 3 of the rule itself ambiguous upon                                             { IDENTIFIER }, { COLON }      When antlr analyzes "stmt" it computes the first[1] set of all      alternatives.  It finds an ambiguity between alts 2 and 3 for ID.      It then computes the first[2] set for alternatives 2 and 3 to resolve      the ambiguity.  In computing the first[2] set of "expr" (which is      only one token long) it needs to determine what could follow "expr".      Under a certain combination of circumstances antlr forgets that it      is trying to analyze "stmt" which can only be followed by SEMI and      adds to the first[2] set of "expr" the "global" follow set (including      "COLON") which could follow "expr" (under other conditions) in the      phrase "CASE expr COLON".#146. (Changed in MR11) Option -treport for locating "difficult" alts      It can be difficult to determine which alternatives are causing      pccts to work hard to resolve an ambiguity.  In some cases the      ambiguity is successfully resolved after much CPU time so there      is no message at all.      A rough measure of the amount of work being peformed which is      independent of the CPU speed and system load is the number of      tnodes created.  Using "-info t" gives information about the      total number of tnodes created and the peak number of tnodes.        Tree Nodes:  peak 1300k  created 1416k  lost 0      It also puts in the generated C or C++ file the number of tnodes      created for a rule (at the end of the rule).  However this      information is not sufficient to locate the alternatives within      a rule which are causing the creation of tnodes.      Using:             antlr -treport 100000 ....      causes antlr to list on stdout any alternatives which require the      creation of more than 100,000 tnodes, along with the lookahead sets      for those alternatives.      The following is a trivial case from the ansi.g grammar which shows      the format of the report.  This report might be of more interest      in cases where 1,000,000 tuples were created to resolve the ambiguity.      -------------------------------------------------------------------------        There were 0 tuples whose ambiguity could not be resolved             by full lookahead        There were 157 tnodes created to resolve ambiguity between:          Choice 1: statement/2  line 475  file ansi.g          Choice 2: statement/3  line 476  file ansi.g            Intersection of lookahead[1] sets:               IDENTIFIER            Intersection of lookahead[2] sets:               LPARENTHESIS     COLON            AMPERSAND        MINUS               STAR             PLUSPLUS         MINUSMINUS       ONESCOMPLEMENT               NOT              SIZEOF           OCTALINT         DECIMALINT               HEXADECIMALINT   FLOATONE         FLOATTWO         IDENTIFIER               STRING           CHARACTER      -------------------------------------------------------------------------#145. (Documentation)  Generation of Expression Trees      Item #99 was misleading because it implied that the optimization      for tree expressions was available only for trees created by      predicate expressions and neglected to mention that it required      the use of "-mrhoist on".  The optimization applies to tree      expressions created for grammars with k>1 and for predicates with      lookahead depth >1.      In MR11 the optimized version is always used so the -mrhoist on      option need not be specified.#144. (Changed in MR11) Incorrect test for exception group      In testing for a rule's exception group the label a pointer      is compared against '\0'.  The intention is "*pointer".      Reported by Jeffrey C. Fried (Jeff@Fried.net).#143. (Changed in MR11) Optional ";" at end of #token statement      Fixes problem of:            #token X "x"            <<                parser action            >>      Being confused with:            #token X "x" <<lexical action>>#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream      Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class      BufFileInput derived from DLGInputStream which provides a      function lookahead(char *string) to test characters in the      input stream more than one character ahead.      The default amount of lookahead is specified by the constructor      and defaults to 8 characters.  This does *not* include the one      character of lookahead maintained internally by DLG in member "ch"      and which is not available for testing via BufFileInput::lookahead().      This is a useful class for overcoming the one-character-lookahead      limitation of DLG without resorting to a lexer capable of      backtracking (like flex) which is not integrated with antlr as is      DLG.      There are no restrictions on copying or using BufFileInput.* except      that the authorship and related information must be retained in the      source code.      The class is located in pccts/h/BufFileInput.* of the kit.#141. (Changed in MR11) ZZDEBUG_CONSUME for ANTLRParser::consume()      A debug aid has been added to file ANTLRParser::consume() in      file AParser.cpp:            #ifdef ZZDEBUG_CONSUME_ACTION                zzdebug_consume_action();            #endif      Suggested by Sramji Ramanathan (ps@kumaran.com).#140. (Changed in MR11) #pred to define predicates      +---------------------------------------------------+      | Note: Assume "-prc on" for this entire discussion |      +---------------------------------------------------+      A problem with predicates is that each one is regarded as      unique and capable of disambiguating cases where two      alternatives have identical lookahead.  For example:        rule : <<pred(LATEXT(1))>>? A             | <<pred(LATEXT(1))>>? A             ;      will not cause any error messages or warnings to be issued      by earlier versions of pccts.  To compare the text of the      predicates is an incomplete solution.      In 1.33MR11 I am introducing the #pred statement in order to      solve some problems with predicates.  The #pred statement allows      one to give a symbolic name to a "predicate literal" or a      "predicate expression" in order to refer to it in other predicate      expressions or in the rules of the grammar.      The predicate literal associated with a predicate symbol is C      or C++ code which can be used to test the condition.  A      predicate expression defines a predicate symbol in terms of other      predicate symbols using "!", "&&", and "||".  A predicate symbol      can be defined in terms of a predicate literal, a predicate      expression, or *both*.      When a predicate symbol is defined with both a predicate literal      and a predicate expression, the predicate literal is used to generate      code, but the predicate expression is used to check for two      alternatives with identical predicates in both alternatives.      Here are some examples of #pred statements:        #pred  IsLabel       <<isLabel(LATEXT(1))>>?        #pred  IsLocalVar    <<isLocalVar(LATEXT(1))>>?        #pred  IsGlobalVar   <<isGlobalVar(LATEXT(1)>>?        #pred  IsVar         <<isVar(LATEXT(1))>>?       IsLocalVar || IsGlobalVar        #pred  IsScoped      <<isScoped(LATEXT(1))>>?    IsLabel || IsLocalVar      I hope that the use of EBNF notation to describe the syntax of the      #pred statement will not cause problems for my readers (joke).        predStatement : "#pred"                            CapitalizedName                              (                                  "<<predicate_literal>>?"                                | "<<predicate_literal>>?"  predOrExpr                                | predOrExpr                              )                      ;        predOrExpr    : predAndExpr ( "||" predAndExpr ) * ;        predAndExpr   : predPrimary ( "&&" predPrimary ) * ;        predPrimary   : CapitalizedName                      | "!" predPrimary                      | "(" predOrExpr ")"                      ;      What is the purpose of this nonsense ?      To understand how predicate symbols help, you need to realize that      predicate symbols are used in two different ways with two different      goals.        a. Allow simplification of predicates which have been combined           during predicate hoisting.        b. Allow recognition of identical predicates which can't disambiguate           alternatives with common lookahead.      First we will discuss goal (a).  Consider the following rule:            rule0: rule1                 | ID                 | ...                 ;            rule1: rule2                 | rule3                 ;            rule2: <<isX(LATEXT(1))>>? ID ;            rule3: <<!isX(LATEXT(1)>>? ID ;      When the predicates in rule2 and rule3 are combined by hoisting      to create a prediction expression for rule1 the result is:            if ( LA(1)==ID                && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...      This is inefficient, but more importantly, can lead to false      assumptions that the predicate expression distinguishes the rule1      alternative with some other alternative with lookahead ID.  In      MR11 one can write:            #pred IsX     <<isX(LATEXT(1))>>?            ...            rule2: <<IsX>>? ID  ;            rule3: <<!IsX>>? ID ;      During hoisting MR11 recognizes this as a special case and      eliminates the predicates.  The result is a prediction      expression like the following:            if ( LA(1)==ID ) { rule1(); ...      Please note that the following cases which appear to be equivalent      *cannot* be simplified by MR11 during hoisting because the hoisting      logic only checks for a "!" in the predicate action, not in the      predicate expression for a predicate symbol.        *Not* equivalent and is not simplified during hoisting:            #pred IsX      <<isX(LATEXT(1))>>?            #pred NotX     <<!isX(LATEXT(1))>>?            ...            rule2: <<IsX>>? ID  ;            rule3: <<NotX>>? ID ;        *Not* equivalent and is not simplified during hoisting:            #pred IsX      <<isX(LATEXT(1))>>?            #pred NotX     !IsX            ...            rule2: <<IsX>>? ID  ;            rule3: <<NotX>>? ID ;      Now we will discuss goal (b).      When antlr discovers that there is a lookahead ambiguity between      two alternatives it attempts to resolve the ambiguity by searching      for predicates in both alternatives.  In the past any predicate      would do, even if the same one appeared in both alternatives:            rule: <<p(LATEXT(1))>>? X                | <<p(LATEXT(1))>>? X                ;      The #pred statement is a start towards solving this problem.      During ambiguity resolution (*not* predicate hoisting) the      predicates for the two alternatives are expanded and compared.      Consider the following example:            #pred Upper     <<isUpper(LATEXT(1))>>?            #pred Lower     <<isLower(LATEXT(1))>>?            #pred Alpha     <<isAlpha(LATEXT(1))>>?  Upper || Lower            rule0: rule1                 | <<Alpha>>? ID                 ;            rule1:                 | rule2                 | rule3                 ...                 ;            rule2: <<Upper>>? ID;            rule3: <<Lower>>? ID;      The definition of #pred Alpha expresses:            a. to test the predicate use the C code "isAlpha(LATEXT(1))"            b. to analyze the predicate use the information that               Alpha is equivalent to the union of Upper and Lower,      During ambiguity resolution the definition of Alpha is expanded      into "Upper || Lower" and compared with the predicate in the other      alternative, which is also "Upper || Lower".  Because they are      identical MR11 will report a problem.    -------------------------------------------------------------------------      t10.g, line 5: warning: the predicates used to disambiguate rule rule0             (file t10.g alt 1 line 5 and alt 2 line 6)             are identical when compared without context and may have no
💿 文件大小 493 K
👤 上传用户 pengcong2008_
📂 所属分类编译器/解释器
🏷️ 相关标签

#分析器 #语法分析器 #集成开发环境
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -