📄 grammar5.txt

📁 用于lex和yacc的c++及c语言的文法,可用来构造C++语言的编译器
💻 TXT
📖 第 1 页 / 共 5 页
字号:
Of these conflicts, the ones that most C++ parser authors  are  mainly 
concerned  with  the  17  listed  at  the end of the above list.  They 
relate to function-like-cast vs  declaration,  and  redundant  parened 
TYPEDEFname  redeclaration  vs  old  style  cast,  etc.  The following 
sections breeze through the "easy" conflicts, and then talk at  length 
about these tough ones.



17 EASY CONFLICTS, WITH HAPPY ENDINGS

The first group of 17 SR conflicts:

8 SR caused by operator function name with trailing * or &
	states: 131, 138, 281, 282
8 SR caused by freestore              with trailing * or &
	states: 571, 572, 778, 779
1 SR caused by dangling else and my laziness
	state: 1100

have  very simple resolutions.  If you are reading this, I assume that 
you are already familiar with the  if-if-else  conflict  (it  is  also 
analyzed in depth in the autodoc5.txt file).

The  8  conflicts based "freestore with trailing * or &" can be hinted 
at by the example:

        a = new int * * object;

Is the above the same as:

        a = (new int) * (* T);
or:
        a = (new (int *)) * T;

As a resolution,  the  "longest  possible  type"  is  isolated  by  my 
grammar.  The result is:

        a = (new (int * * )) ...

which  leads  to  a  syntax  error  in my example!  This resolution is 
indeed what is quietly specified  for  C++  2.0  (and  implemented  in 
cfront).  The critical statement and example in the C++ 2.0 Ref Man at 
the end of section 5.3.3 makes this resolution clear.

The  8 conflicts involving "operator function names with trailing * or 
&" are quite similar to what was just presented.  The critical fact is 
that "operator typename"  is  allowed  in  the  grammar  to  define  a 
function.  Whenever a function is provided, but NOT followed by a '(', 
the  address  of  the  function  is  implicitly  taken and used in the 
expression (see draft ANSI C standard for finer  details).   For  some 
class T, the following MIGHT all be valid statements:

        operator T;
        operator T*;
        operator T**;

If  the  above  are valid, then the interpretation of the following is 
ambiguous:

        operator T * * a;

The above might be interpreted as:

        (operator T) * (* a);
or
        (operator (T *)) * a;

The default LR rules parse the largest possible type, and lead to:

        (operator (T * * )) ...

which in our example leads to a syntax error.  Here again the "longest 
possible type..." rule supports my grammar.  Note that  this  rule  is 
actually  a  consequence  (in my opinion) of the cfront implementation 
via a YACC grammar, and the default resolution of  conflicts  in  that 
grammar.



1 SR CONFLICT WITH AN ALMOST HAPPY ENDING

1 SR caused by member declaration of sub-structure, with trailing :
	state: 64

Note that this conflict is different from the one isolated in the 6/90 
version of this grammar, that pertained to ':'.

The conflict now takes place (as seen in the demonstrations section of 
y.output) in variations of the following program prefix:

	class A { class B :

The  problem  is  that  "class B" can be a declaration specifier, much 
like "int".  When a bit field is defined, then  "int  :"  provides  an 
unnamed  bit  field  member  of  a  structure.   My grammar avoids the 
reduction of "class B" to declaration specifier in this  context,  and 
hence  disambiguates  in  favor  of  a nested class, with a derivation 
list:

	class A { class B : Parent { ..... };};

Although this  looks  reasonable  today,  when  template  classes  are 
introduced,  and  the  supplied  type may be "int", then a real syntax 
problem  will  be  present   (hence,   the   almost   happy   ending). 
Specifically,  one  can  image a parameterized type, which is given as 
input (i.e., an argument) either "signed int", or "unsigned int",  and 
then  this  type  might  be  used  as a declaration specifier in a bit 
field.  When the type is referred to during  the  parameterized  class 
elaboration,  the  reference *can* be made in the form "class T", even 
though "T" is simply "signed int".

For now (until the template syntax is worked out), the  disambiguation 
provided by this grammar will do very nicely.



6 NOVEL CONFLICTS THAT YIELD TO SEMANTIC DISAMBIGUATION

The  conflicts  that  are discussed in this section have been deferred 
(by A LOT of work, and A LOT of inline expansion) to occur when a ';', 
or a '{' is reached.  At  that  point,  semantic  information  in  the 
tokens can safely be used to decide which of two cases are at hand.

The conflicts are referred to as:

    6 RR caused by constructor declaration vs member declaration
	states: 1105, 1152, 1175


occur during a class/struct elaboration.  Consider the following class 
elaborations:

        typedef int T1, T2, T3  ;
        class GOO { int a;} ;
        class FOO {
                GOO    T1  ; // clearly a redefinition of T1
                FOO  ( T2 ); // clearly a constructor
                GOO  ( T3 ); // redefinition of T3
                };

Note  that  the  last  two entries in FOO's elaboration "FOO(T2);" and 
"GOO(T3);" are  tokenized  IDENTICALLY,  but  must  have  dramatically 
different  meanings.   When I first found this ambiguity I was hopeful 
that I could extend the lex hack that distinguishes TYPEDEFnames  from 
random  IDENTIFIERs,  and distinguish something like CURRENTCLASSname. 
Unfortunately, the  potential  for  elaborations  within  elaborations 
appears  to  make  such a hack unworkable.  In addition, once I got my 
grammar to defer all such ambiguous cases until a ';' was seen, I felt 
confident that the ambiguity was resolved (and the introduction of  an 
additional "hack" was unnecessary).

Note  that  the  situations  are  identical  when a '{' is seen, as it 
presents the start of the body of either a function, or a constructor, 
and an identical decision must be made.


1 CONFLICT THAT CONSTRAINTS SUPPORT THE RESOLUTION FOR

With  this  new  grammar,  the  ability  to  make  explicit  calls  to 
constructors is supported.  As pointed out in section 12.4 of the ARM, 
the  implicit  use  of  the  "this"  pointer  to  refer  to a specific 
destructor leads to an ambiguity with the unary "~" operation.   As  a 
result, in situations where it is possible to parse a sentence so that 
the "~" is the unary operator, it is done.  The conflict is shown as:

1 SR caused by explicit call to destructor, without explicit scope
	state: 536

Note that the reduction:

	complex_name : '~' TYPEDEFname 

is  what  is  used  to develop a "name" that can be used to refer to a 
destructor  explicitly.   The  decision  is  made  to  not  use   this 
reduction,  in favor of "something else" (which results from a shift). 
Since the only alternative to specifying a destructor is to  make  "~" 
serve as a unary operator, we are assured that we support the standard 
given in the ARM.  Note that this should probably be officially listed 
as  a part of the syntax restrictions, but in any case, it is at least 
a disambiguating constraint, and we are guaranteed to support it.



THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY (17 CONFLICTS)

The  ambiguities  listed  in  this  section  pertain  to  attempts  to 
distinguish                declaration/types-names                from 
expression-statements/expressions.  For example:

    char *b ="external" ; // declare a variable to confuse us :-)
    main () {
        class S;
        S (*b)[5]; // redeclare "b" pointer to array of 5 S's ?
               // OR ELSE indirect through b; cast to S; index using 5 ?
        }

The above is what I  call  the  "declaration  vs  function  like  cast 
ambiguity".   Awareness  about  this ambiguity in this context appears 
fairly widespread among C++ parser authors.  The  ARM  makes  explicit 
reference  to  this  problem in section 6.8 "Ambiguity Resolution".  I 
believe the underlying philosophy provided by the Reference Manual  is 
that  if a token stream can be interpreted by an ANSI C compiler to be 
a declaration, then a C++ compiler should disambiguate in favor  of  a 
declaration.  Unfortunately, section 6.8 goes on to say:

    "To  disambiguate,  the whole statement may have to be examined to 
    determine if it is an expression-statement, or a declaration.  ... 
    The disambiguation is purely syntactic; that is,  the  meaning  of 
    the  names, beyond whether they are type-names or not, is not used 
    in the disambiguation".

The  above  advice  only  forestalls  the  inevitable  ambiguity,  and 
complicates  the  language  in  the process.  The examples that follow 
will demonstrate the difficulties.

There are several other contexts where such  ambiguities  (typedef  vs 
expression) arise:

        1) Where a statement is valid (as shown above).
        2) As the argument to sizeof()
        3) Following "new", with the C++ syntax allowing a placement
                expression
        4) Immediately following a left paren  in  an  expression  (it 
                might be an old style cast, and hence a type name)
        5) Following  a  left paren, arguments to constructors can be 
                confused with prototype type-names.
        6) Recursively in any of the above,  following  a  left  paren 
                (what  follows might be argument expressions, or might 
                be function prototype parameter typing)

Examples of simple versions of the sizeof context are:

        class T;
        sizeof ( T    ); // sizeof (type-name)
        sizeof ( T[5] ); // again a type name
        sizeof ( T(5) ); // sizeof (expression)
        sizeof ( T()  ); // semantic error: sizeof "function returning T"?
                        // OR ELSE sizeof result of function like cast

Examples  of  the  old  style  cast ambiguity context, which are still 
ambiguous when the '(' after the 'T' has been seen:

        class T { /* put required declarations here */ 
                };
        a = (T(      5));  // function like cast of 5
        b = (T(      )) 0; // semantic error: cast of 0 to type "function
                        // returning T"

In constructors the following demonstrates the problems:

        class T;
        T (b)(int  d ); // same as "T b(int);", a function declaration
        T (d)(int (5)); // same as "T d(5);", an identifier declaration
        T (d)(int ( )); // ambiguous

The problem can appear recursively  in  the  following  examples.   By 
"recursively"  I  mean  that an ambiguity in the left-context has made 
the parser unsure of whether an "expression"  or  a  "type"  is  being 
parsed,  and  the ambiguity is continued by the token sequence.  After 
the parser can determine what this subsequence is, it will in turn  be 
able to disambiguating what the prior tokens were.

Recursion on the statement/declaration context:

        class S;
        class T;
        S (*b)(T); // declare b "pointer to function taking T returning S"
        S (*c)(T dummy); // same declaration as for "b"
        int dummy;
        S (*d)(T (dummy)); // This T might be casting dummy

Recursion on the sizeof context is shown in the following examples. As 
before, the examples include semantic errors.

        class T;
        class S;
        sizeof ( T(S dummy) ); // sizeof "function taking S returning T"
        int dummy;
        sizeof ( T(S (dummy)) ); // sizeof "function taking S returning T"
                // OR ELSE cast dummy to S, and then cast that to T, which
                        // is the same as "sizeof T;"


The  following are the list of conflicts that fall into the categories 
listed above.  To see the complete details of each conflict state  see 
the y.output file supplied with this grammar.


3 RR caused by function-like cast vs typedef redeclaration ambiguity
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -