changes_from_133_before_mr13.txt

来自「SRI international 发布的OAA框架软件」· 文本 代码 · 共 1,543 行 · 第 1/5 页

TXT
1,543
字号

    ------------------------------------------------------------
           This is the second part of a two part file.
      This is a list of changes to pccts 1.33 prior to MR13
       For more recent information see CHANGES_FROM_133.txt
    ------------------------------------------------------------

                               DISCLAIMER

 The software and these notes are provided "as is".  They may include
 typographical or technical errors and their authors disclaims all
 liability of any kind or nature for damages due to error, fault,
 defect, or deficiency regardless of cause.  All warranties of any
 kind, either express or implied, including, but not limited to, the
 implied  warranties of merchantability and fitness for a particular
 purpose are disclaimed.


#153. (Changed in MR12b) Bug in computation of -mrhoist suppression set

      Consider the following grammar with k=1 and "-mrhoist on":

            r1  : (A)? => ((p>>? x      /* l1 */
                | r2                    /* l2 */
                ;
            r2  :  A                    /* l4 */
                | (B)? => <<q>>? y      /* l5 */
                ;

      In earlier versions the mrhoist routine would see that both l1 and
      l2 contained predicates and would assume that this prevented either
      from acting to suppress the other predicate.  In the example above
      it didn't realize the A at line l4 is capable of suppressing the
      predicate at l1 even though alt l2 contains (indirectly) a predicate.

      This is fixed  in MR12b.

      Reported by Reinier van den Born (reinier@vnet.ibm.com)

#153. (Changed in MR12a) Bug in computation of -mrhoist suppression set

      An oversight similar to that described in Item #152 appeared in
      the computation of the set that "covered" a predicate.  If a
      predicate expression included a term such as p=AND(q,r) the context
      of p was taken to be context(q) & context(r), when it should have
      been context(q) | context(r).  This is fixed in MR12a.

#152. (Changed in MR12) Bug in generation of predicate expressions

      The primary purpose for MR12 is to make quite clear that MR11 is
      obsolete and to fix the bug related to predicate expressions.

      In MR10 code was added to optimize the code generated for
      predicate expression tests.  Unfortunately, there was a
      significant oversight in the code which resulted in a bug in
      the generation of code for predicate expression tests which
      contained predicates combined using AND:

            r0 : (r1)* "@" ;
            r1 : (AAA)? => <<p LATEXT(1)>>? r2 ;
            r2 : (BBB)? => <<q LATEXT(1)>>? Q
               | (BBB)? => <<r LATEXT(1)>>? Q
               ;

      In MR11 (and MR10 when using "-mrhoist on") the code generated
      for r0 to predict r1 would be equivalent to:

        if ( LA(1)==Q &&
                (LA(1)==AAA && LA(1)==BBB) &&
                    ( p && ( q || r )) ) {

      This is incorrect because it expresses the idea that LA(1)
      *must* be AAA in order to attempt r1, and *must* be BBB to
      attempt r2.  The result was that r1 became unreachable since
      both condition can not be simultaneously true.

      The general philosophy of code generation for predicates
      can be summarized as follows:

            a. If the context is true don't enter an alt
               for which the corresponding predicate is false.

               If the context is false then it is okay to enter
               the alt without evaluating the predicate at all.

            b. A predicate created by ORing of predicates has
               context which is the OR of their individual contexts.

            c. A predicate created by ANDing of predicates has
               (surprise) context which is the OR of their individual
               contexts.

            d. Apply these rules recursively.

            e. Remember rule (a)

      The correct code should express the idea that *if* LA(1) is
      AAA then p must be true to attempt r1, but if LA(1) is *not*
      AAA then it is okay to attempt r1, provided that *if* LA(1) is
      BBB then one of q or r must be true.

        if ( LA(1)==Q &&
                ( !(LA(1)==AAA || LA(1)==BBB) ||
                    ( ! LA(1) == AAA || p) &&
                    ( ! LA(1) == BBB || q || r ) ) ) {

      I believe this is fixed in MR12.

      Reported by Reinier van den Born (reinier@vnet.ibm.com)

#151a. (Changed in MR12) ANTLRParser::getLexer()

      As a result of several requests, I have added public methods to
      get a pointer to the lexer belonging to a parser.

            ANTLRTokenStream *ANTLRParser::getLexer() const

                Returns a pointer to the lexer being used by the
                parser.  ANTLRTokenStream is the base class of
                DLGLexer

            ANTLRTokenStream *ANTLRTokenBuffer::getLexer() const

                Returns a pointer to the lexer being used by the
                ANTLRTokenBuffer.  ANTLRTokenStream is the base
                class of DLGLexer

      You must manually cast the ANTLRTokenStream to your program's
      lexer class. Because the name of the lexer's class is not fixed.
      Thus it is impossible to incorporate it into the DLGLexerBase
      class.

#151b.(Changed in MR12) ParserBlackBox member getLexer()

      The template class ParserBlackBox now has a member getLexer()
      which returns a pointer to the lexer.

#150. (Changed in MR12) syntaxErrCount and lexErrCount now public

      See Item #127 for more information.

#149. (Changed in MR12) antlr option -info o (letter o for orphan)

      If there is more than one rule which is not referenced by any
      other rule then all such rules are listed.  This is useful for
      alerting one to rules which are not used, but which can still
      contribute to ambiguity.  For example:

            start : a Z ;
            unused: a A ;
            a     : (A)+ ;

      will cause an ambiguity report for rule "a" which will be
      difficult to understand if the user forgets about rule "unused"
      simply because it is not used in the grammar.

#148. (Changed in MR11) #token names appearing in zztokens,token_tbl

      In a #token statement like the following:

            #token Plus "\+"

      the string "Plus" appears in the zztokens array (C mode) and
      token_tbl (C++ mode).  This string is used in most error
      messages.  In MR11 one has the option of using some other string,
      (e.g.  "+") in those tables.

      In MR11 one can write:

            #token Plus ("+")             "\+"
            #token RP   ("(")             "\("
            #token COM  ("comment begin") "/\*"

      A #token statement is allowed to appear in more than one #lexclass
      with different regular expressions.  However, the token name appears
      only once in the zztokens/token_tbl array.  This means that only
      one substitute can be specified for a given #token name.  The second
      attempt to define a substitute name (different from the first) will
      result in an error message.

#147. (Changed in MR11) Bug in follow set computation

      There is a bug in 1.33 vanilla and all maintenance releases
      prior to MR11 in the computation of the follow set.  The bug is
      different than that described in Item #82 and probably more
      common.  It was discovered in the ansi.g grammar while testing
      the "ambiguity aid" (Item #119). The search for a bug started
      when the ambiguity aid was unable to discover the actual source
      of an ambiguity reported by antlr.

      The problem appears when an optimization of the follow set
      computation is used inappropriately.  The result is that the
      follow set used is the "worst case".  In other words, the error
      can lead to false reports of ambiguity.  The good news is that
      if you have a grammar in which you have addressed all reported
      ambiguities you are ok.  The bad news is that you may have spent
      time fixing ambiguities that were not real, or used k=2 when
      ck=2 might have been sufficient, and so on.

      The following grammar demonstrates the problem:

        ------------------------------------------------------------
            expr          :   ID ;

            start         :   stmt SEMI ;

            stmt          :   CASE expr COLON
                          |   expr SEMI
                          |   plain_stmt
                          ;

            plain_stmt    :   ID COLON ;
        ------------------------------------------------------------

      When compiled with k=1 and ck=2 it will report:

         warning: alts 2 and 3 of the rule itself ambiguous upon
                                             { IDENTIFIER }, { COLON }

      When antlr analyzes "stmt" it computes the first[1] set of all
      alternatives.  It finds an ambiguity between alts 2 and 3 for ID.
      It then computes the first[2] set for alternatives 2 and 3 to resolve
      the ambiguity.  In computing the first[2] set of "expr" (which is
      only one token long) it needs to determine what could follow "expr".
      Under a certain combination of circumstances antlr forgets that it
      is trying to analyze "stmt" which can only be followed by SEMI and
      adds to the first[2] set of "expr" the "global" follow set (including
      "COLON") which could follow "expr" (under other conditions) in the
      phrase "CASE expr COLON".

#146. (Changed in MR11) Option -treport for locating "difficult" alts

      It can be difficult to determine which alternatives are causing
      pccts to work hard to resolve an ambiguity.  In some cases the
      ambiguity is successfully resolved after much CPU time so there
      is no message at all.

      A rough measure of the amount of work being peformed which is
      independent of the CPU speed and system load is the number of
      tnodes created.  Using "-info t" gives information about the
      total number of tnodes created and the peak number of tnodes.

        Tree Nodes:  peak 1300k  created 1416k  lost 0

      It also puts in the generated C or C++ file the number of tnodes
      created for a rule (at the end of the rule).  However this
      information is not sufficient to locate the alternatives within
      a rule which are causing the creation of tnodes.

      Using:

             antlr -treport 100000 ....

      causes antlr to list on stdout any alternatives which require the
      creation of more than 100,000 tnodes, along with the lookahead sets
      for those alternatives.

      The following is a trivial case from the ansi.g grammar which shows
      the format of the report.  This report might be of more interest
      in cases where 1,000,000 tuples were created to resolve the ambiguity.

      -------------------------------------------------------------------------
        There were 0 tuples whose ambiguity could not be resolved
             by full lookahead
        There were 157 tnodes created to resolve ambiguity between:

          Choice 1: statement/2  line 475  file ansi.g
          Choice 2: statement/3  line 476  file ansi.g

            Intersection of lookahead[1] sets:

               IDENTIFIER

            Intersection of lookahead[2] sets:

               LPARENTHESIS     COLON            AMPERSAND        MINUS
               STAR             PLUSPLUS         MINUSMINUS       ONESCOMPLEMENT
               NOT              SIZEOF           OCTALINT         DECIMALINT
               HEXADECIMALINT   FLOATONE         FLOATTWO         IDENTIFIER
               STRING           CHARACTER
      -------------------------------------------------------------------------

#145. (Documentation)  Generation of Expression Trees

      Item #99 was misleading because it implied that the optimization
      for tree expressions was available only for trees created by
      predicate expressions and neglected to mention that it required
      the use of "-mrhoist on".  The optimization applies to tree
      expressions created for grammars with k>1 and for predicates with
      lookahead depth >1.

      In MR11 the optimized version is always used so the -mrhoist on
      option need not be specified.

#144. (Changed in MR11) Incorrect test for exception group

      In testing for a rule's exception group the label a pointer
      is compared against '\0'.  The intention is "*pointer".

      Reported by Jeffrey C. Fried (Jeff@Fried.net).

#143. (Changed in MR11) Optional ";" at end of #token statement

      Fixes problem of:

            #token X "x"

            <<
                parser action

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?