changes_from_133.txt

来自「SRI international 发布的OAA框架软件」· 文本 代码 · 共 1,744 行 · 第 1/5 页

TXT
1,744
字号
    the value used by antlr, dlg, and sorcerer has also been raised to
    32,000.

#259. (MR22) Default function arguments in C++ mode.

    If a rule is declared:

            rr [int i = 0] : ....

    then the declaration generated by pccts resembles:

            void rr(int i = 0);

    however, the definition must omit the default argument:

            void rr(int i) {...}

    In the past the default value was not omitted.  In MR22
    the generated code resembles:

            void rr(int i /* = 0 */ ) {...}

    Implemented by Volker H. Simonis (simonis informatik.uni-tuebingen.de)


    Note: In MR23 this was changed so that nested C style comments
    ("/* ... */") would not cause problems.

#258. (MR22)  Using a base class for your parser

    In item #102 (MR10) the class statement was extended to allow one
    to specify a base class other than ANTLRParser for the generated
    parser.  It turned out that this was less than useful because
    the constructor still specified ANTLRParser as the base class.

    The class statement now uses the first identifier appearing after
    the ":" as the name of the base class.  For example:

        class MyParser : public FooParser {

    Generates in MyParser.h:

            class MyParser : public FooParser {

    Generates in MyParser.cpp something that resembles:

            MyParser::MyParser(ANTLRTokenBuffer *input) :
                                         FooParser(input,1,0,0,4)
            {
                token_tbl = _token_tbl;
                traceOptionValueDefault=1;    // MR10 turn trace ON
            }

    The base class constructor must have a signature similar to
    that of ANTLRParser.

#257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.

    This was incorrect.

#256. (MR21a) Malformed syntax graph causes crash after error message.

    In the past, certain kinds of errors in the very first grammar
    element could cause the construction of a malformed graph 
    representing the grammar.  This would eventually result in a
    fatal internal error.  The code has been changed to be more
    resistant to this particular error.

#255. (MR21a) ParserBlackBox(FILE* f) 

    This constructor set openByBlackBox to the wrong value.

    Reported by Kees Bakker (kees_bakker tasking.nl).

#254. (MR21a) Reporting syntax error at end-of-file

    When there was a syntax error at the end-of-file the syntax
    error routine would substitute "<eof>" for the programmer's
    end-of-file symbol.  This substitution is now done only when
    the programmer does not define his own end-of-file symbol
    or the symbol begins with the character "@".

    Reported by Kees Bakker (kees_bakker tasking.nl).

#253. (MR21) Generation of block preamble (-preamble and -preamble_first)

        *** This change was rescinded by item #263 ***

    The antlr option -preamble causes antlr to insert the code
    BLOCK_PREAMBLE at the start of each rule and block.  It does
    not insert code before rules references, token references, or
    actions.  By properly defining the macro BLOCK_PREAMBLE the
    user can generate code which is specific to the start of blocks.

    The antlr option -preamble_first is similar, but inserts the
    code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
    PreambleFirst_123 is equivalent to the first set defined by
    the #FirstSetSymbol described in Item #248.

    I have not investigated how these options interact with guess
    mode (syntactic predicates).

#252. (MR21) Check for null pointer in trace routine

    When some trace options are used when the parser is generated
    without the trace enabled, the current rule name may be a
    NULL pointer.  A guard was added to check for this in
    restoreState.

    Reported by Douglas E. Forester (dougf projtech.com).

#251. (MR21) Changes to #define zzTRACE_RULES

    The macro zzTRACE_RULES was being use to pass information to
    AParser.h.  If this preprocessor symbol was not properly
    set the first time AParser.h was #included, the declaration
    of zzTRACEdata would be omitted (it is used by the -gd option).
    Subsequent #includes of AParser.h would be skipped because of 
    the #ifdef guard, so the declaration of zzTracePrevRuleName would
    never be made.  The result was that proper compilation was very 
    order dependent.

    The declaration of zzTRACEdata was made unconditional and the
    problem of removing unused declarations will be left to optimizers.
    
    Diagnosed by Douglas E. Forester (dougf projtech.com).

#250. (MR21) Option for EXPERIMENTAL change to error sets for blocks

    The antlr option -mrblkerr turns on an experimental feature
    which is supposed to provide more accurate syntax error messages
    for k=1, ck=1 grammars.  When used with k>1 or ck>1 grammars the
    behavior should be no worse than the current behavior.

    There is no problem with the matching of elements or the computation
    of prediction expressions in pccts.  The task is only one of listing
    the most appropriate tokens in the error message.  The error sets used
    in pccts error messages are approximations of the exact error set when
    optional elements in (...)* or (...)+ are involved.  While entirely
    correct, the error messages are sometimes not 100% accurate.  

    There is also a minor philosophical issue.  For example, suppose the
    grammar expects the token to be an optional A followed by Z, and it 
    is X.  X, of course, is neither A nor Z, so an error message is appropriate.
    Is it appropriate to say "Expected Z" ?  It is correct, it is accurate,
    but it is not complete.  

    When k>1 or ck>1 the problem of providing the exactly correct
    list of tokens for the syntax error messages ends up becoming
    equivalent to evaluating the prediction expression for the
    alternatives twice. However, for k=1 ck=1 grammars the prediction
    expression can be computed easily and evaluated cheaply, so I
    decided to try implementing it to satisfy a particular application.
    This application uses the error set in an interactive command language
    to provide prompts which list the alternatives available at that
    point in the parser.  The user can then enter additional tokens to
    complete the command line.  To do this required more accurate error 
    sets then previously provided by pccts.

    In some cases the default pccts behavior may lead to more robust error
    recovery or clearer error messages then having the exact set of tokens.
    This is because (a) features like -ge allow the use of symbolic names for
    certain sets of tokens, so having extra tokens may simply obscure things
    and (b) the error set is use to resynchronize the parser, so a good
    choice is sometimes more important than having the exact set.

    Consider the following example:

            Note:  All examples code has been abbreviated
            to the absolute minimum in order to make the
            examples concise.

        star1 : (A)* Z;

    The generated code resembles:

           old                new (with -mrblkerr)
        --//-----------         --------------------
        for (;;) {            for (;;) {
            match(A);           match(A);
        }                     }
        match(Z);             if (! A and ! Z) then
                                FAIL(...{A,Z}...);
                              }
                              match(Z);


        With input X
            old message: Found X, expected Z
            new message: Found X, expected A, Z

    For the example:

        star2 : (A|B)* Z;

           old                      new (with -mrblkerr)
        -------------               --------------------
        for (;;) {                  for (;;) {
          if (!A and !B) break;       if (!A and !B) break;
          if (...) {                  if (...) {
            <same ...>                  <same ...>
          }                           }
          else {                      else {
            FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
          }                           }
        }                           }
        match(B);                   if (! A and ! B and !Z) then
                                        FAIL(...{A,B,Z}...);
                                    }
                                    match(B);

        With input X
            old message: Found X, expected Z
            new message: Found X, expected A, B, Z
        With input A X
            old message: Found X, expected Z
            new message: Found X, expected A, B, Z

            This includes the choice of looping back to the
            star block.

    The code for plus blocks:

        plus1 : (A)+ Z;

    The generated code resembles:

           old                  new (with -mrblkerr)
        -------------           --------------------
        do {                    do {
          match(A);               match(A);
        } while (A)             } while (A)
        match(Z);               if (! A and ! Z) then
                                  FAIL(...{A,Z}...);
                                }
                                match(Z);

        With input A X
            old message: Found X, expected Z
            new message: Found X, expected A, Z

            This includes the choice of looping back to the
            plus block.

    For the example:

        plus2 : (A|B)+ Z;

           old                    new (with -mrblkerr)
        -------------             --------------------
        do {                        do {
          if (A) {                    <same>
            match(A);                 <same>
          } else if (B) {             <same>
            match(B);                 <same>
          } else {                    <same>
            if (cnt > 1) break;       <same>
            FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
          }                           }
          cnt++;                      <same>
        }                           }

        match(Z);                   if (! A and ! B and !Z) then
                                        FAIL(...{A,B,Z}...);
                                    }
                                    match(B);

        With input X
            old message: Found X, expected A, B, Z
            new message: Found X, expected A, B
        With input A X
            old message: Found X, expected Z
            new message: Found X, expected A, B, Z

            This includes the choice of looping back to the
            star block.
    
#249. (MR21) Changes for DEC/VMS systems

    Jean-Fran噊is Pi俽onne (jfp altavista.net) has updated some
    VMS related command files and fixed some minor problems related
    to building pccts under the DEC/VMS operating system.  For DEC/VMS
    users the most important differences are:

        a.  Revised makefile.vms
        b.  Revised genMMS for genrating VMS style makefiles.

#248. (MR21) Generate symbol for first set of an alternative

    pccts can generate a symbol which represents the tokens which may
    appear at the start of a block:

        rr : #FirstSetSymbol(rr_FirstSet)  ( Foo | Bar ) ;

    This will generate the symbol rr_FirstSet of type SetWordType with
    elements Foo and Bar set. The bits can be tested using code similar 
    to the following:

        if (set_el(Foo, &rr_FirstSet)) { ...

    This can be combined with the C array zztokens[] or the C++ routine
    tokenName() to get the print name of the token in the first set.

    The size of the set is given by the newly added enum SET_SIZE, a 
    protected member of the generated parser's class.  The number of
    elements in the generated set will not be exactly equal to the 
    value of SET_SIZE because of synthetic tokens created by #tokclass,
    #errclass, the -ge option, and meta-tokens such as epsilon, and
    end-of-file.

    The #FirstSetSymbol must appear immediately before a block
    such as (...)+, (...)*, and {...}, and (...).  It may not appear
    immediately before a token, a rule reference, or action.  However
    a token or rule reference can be enclosed in a (...) in order to
    make the use of #pragma FirstSetSymbol legal.

            rr_bad : #FirstSetSymbol(rr_bad_FirstSet) Foo;   //  Illegal

            rr_ok :  #FirstSetSymbol(rr_ok_FirstSet) (Foo);  //  Legal
    
    Do not confuse FirstSetSymbol sets with the sets used for testing
    lookahead. The sets used for FirstSetSymbol have one element per bit,
    so the number of bytes  is approximately the largest token number
    divided by 8.  The sets used for testing lookahead store 8 lookahead 
    sets per byte, so the length of the array is approximately the largest
    token number.

    If there is demand, a similar routine for follow sets can be added.

#247. (MR21) Misleading error message on syntax error for optional elements.

        ===================================================
        The behavior has been revised when parser exception
        handling is used.  See Item #290
        ===================================================

    Prior to MR21, tokens which were optional did not appear in syntax
    error messages if the block which immediately followed detected a 
    syntax error.

    Consider the following grammar which accepts Number, Word, and Other:

            rr : {Number} Word;

    For this rule the code resembles:

            if (LA(1) == Number) {
                match(Number);
                consume();

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?