📄 grammar5.txt
字号:
Of these conflicts, the ones that most C++ parser authors are mainly
concerned with the 17 listed at the end of the above list. They
relate to function-like-cast vs declaration, and redundant parened
TYPEDEFname redeclaration vs old style cast, etc. The following
sections breeze through the "easy" conflicts, and then talk at length
about these tough ones.
17 EASY CONFLICTS, WITH HAPPY ENDINGS
The first group of 17 SR conflicts:
8 SR caused by operator function name with trailing * or &
states: 131, 138, 281, 282
8 SR caused by freestore with trailing * or &
states: 571, 572, 778, 779
1 SR caused by dangling else and my laziness
state: 1100
have very simple resolutions. If you are reading this, I assume that
you are already familiar with the if-if-else conflict (it is also
analyzed in depth in the autodoc5.txt file).
The 8 conflicts based "freestore with trailing * or &" can be hinted
at by the example:
a = new int * * object;
Is the above the same as:
a = (new int) * (* T);
or:
a = (new (int *)) * T;
As a resolution, the "longest possible type" is isolated by my
grammar. The result is:
a = (new (int * * )) ...
which leads to a syntax error in my example! This resolution is
indeed what is quietly specified for C++ 2.0 (and implemented in
cfront). The critical statement and example in the C++ 2.0 Ref Man at
the end of section 5.3.3 makes this resolution clear.
The 8 conflicts involving "operator function names with trailing * or
&" are quite similar to what was just presented. The critical fact is
that "operator typename" is allowed in the grammar to define a
function. Whenever a function is provided, but NOT followed by a '(',
the address of the function is implicitly taken and used in the
expression (see draft ANSI C standard for finer details). For some
class T, the following MIGHT all be valid statements:
operator T;
operator T*;
operator T**;
If the above are valid, then the interpretation of the following is
ambiguous:
operator T * * a;
The above might be interpreted as:
(operator T) * (* a);
or
(operator (T *)) * a;
The default LR rules parse the largest possible type, and lead to:
(operator (T * * )) ...
which in our example leads to a syntax error. Here again the "longest
possible type..." rule supports my grammar. Note that this rule is
actually a consequence (in my opinion) of the cfront implementation
via a YACC grammar, and the default resolution of conflicts in that
grammar.
1 SR CONFLICT WITH AN ALMOST HAPPY ENDING
1 SR caused by member declaration of sub-structure, with trailing :
state: 64
Note that this conflict is different from the one isolated in the 6/90
version of this grammar, that pertained to ':'.
The conflict now takes place (as seen in the demonstrations section of
y.output) in variations of the following program prefix:
class A { class B :
The problem is that "class B" can be a declaration specifier, much
like "int". When a bit field is defined, then "int :" provides an
unnamed bit field member of a structure. My grammar avoids the
reduction of "class B" to declaration specifier in this context, and
hence disambiguates in favor of a nested class, with a derivation
list:
class A { class B : Parent { ..... };};
Although this looks reasonable today, when template classes are
introduced, and the supplied type may be "int", then a real syntax
problem will be present (hence, the almost happy ending).
Specifically, one can image a parameterized type, which is given as
input (i.e., an argument) either "signed int", or "unsigned int", and
then this type might be used as a declaration specifier in a bit
field. When the type is referred to during the parameterized class
elaboration, the reference *can* be made in the form "class T", even
though "T" is simply "signed int".
For now (until the template syntax is worked out), the disambiguation
provided by this grammar will do very nicely.
6 NOVEL CONFLICTS THAT YIELD TO SEMANTIC DISAMBIGUATION
The conflicts that are discussed in this section have been deferred
(by A LOT of work, and A LOT of inline expansion) to occur when a ';',
or a '{' is reached. At that point, semantic information in the
tokens can safely be used to decide which of two cases are at hand.
The conflicts are referred to as:
6 RR caused by constructor declaration vs member declaration
states: 1105, 1152, 1175
occur during a class/struct elaboration. Consider the following class
elaborations:
typedef int T1, T2, T3 ;
class GOO { int a;} ;
class FOO {
GOO T1 ; // clearly a redefinition of T1
FOO ( T2 ); // clearly a constructor
GOO ( T3 ); // redefinition of T3
};
Note that the last two entries in FOO's elaboration "FOO(T2);" and
"GOO(T3);" are tokenized IDENTICALLY, but must have dramatically
different meanings. When I first found this ambiguity I was hopeful
that I could extend the lex hack that distinguishes TYPEDEFnames from
random IDENTIFIERs, and distinguish something like CURRENTCLASSname.
Unfortunately, the potential for elaborations within elaborations
appears to make such a hack unworkable. In addition, once I got my
grammar to defer all such ambiguous cases until a ';' was seen, I felt
confident that the ambiguity was resolved (and the introduction of an
additional "hack" was unnecessary).
Note that the situations are identical when a '{' is seen, as it
presents the start of the body of either a function, or a constructor,
and an identical decision must be made.
1 CONFLICT THAT CONSTRAINTS SUPPORT THE RESOLUTION FOR
With this new grammar, the ability to make explicit calls to
constructors is supported. As pointed out in section 12.4 of the ARM,
the implicit use of the "this" pointer to refer to a specific
destructor leads to an ambiguity with the unary "~" operation. As a
result, in situations where it is possible to parse a sentence so that
the "~" is the unary operator, it is done. The conflict is shown as:
1 SR caused by explicit call to destructor, without explicit scope
state: 536
Note that the reduction:
complex_name : '~' TYPEDEFname
is what is used to develop a "name" that can be used to refer to a
destructor explicitly. The decision is made to not use this
reduction, in favor of "something else" (which results from a shift).
Since the only alternative to specifying a destructor is to make "~"
serve as a unary operator, we are assured that we support the standard
given in the ARM. Note that this should probably be officially listed
as a part of the syntax restrictions, but in any case, it is at least
a disambiguating constraint, and we are guaranteed to support it.
THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY (17 CONFLICTS)
The ambiguities listed in this section pertain to attempts to
distinguish declaration/types-names from
expression-statements/expressions. For example:
char *b ="external" ; // declare a variable to confuse us :-)
main () {
class S;
S (*b)[5]; // redeclare "b" pointer to array of 5 S's ?
// OR ELSE indirect through b; cast to S; index using 5 ?
}
The above is what I call the "declaration vs function like cast
ambiguity". Awareness about this ambiguity in this context appears
fairly widespread among C++ parser authors. The ARM makes explicit
reference to this problem in section 6.8 "Ambiguity Resolution". I
believe the underlying philosophy provided by the Reference Manual is
that if a token stream can be interpreted by an ANSI C compiler to be
a declaration, then a C++ compiler should disambiguate in favor of a
declaration. Unfortunately, section 6.8 goes on to say:
"To disambiguate, the whole statement may have to be examined to
determine if it is an expression-statement, or a declaration. ...
The disambiguation is purely syntactic; that is, the meaning of
the names, beyond whether they are type-names or not, is not used
in the disambiguation".
The above advice only forestalls the inevitable ambiguity, and
complicates the language in the process. The examples that follow
will demonstrate the difficulties.
There are several other contexts where such ambiguities (typedef vs
expression) arise:
1) Where a statement is valid (as shown above).
2) As the argument to sizeof()
3) Following "new", with the C++ syntax allowing a placement
expression
4) Immediately following a left paren in an expression (it
might be an old style cast, and hence a type name)
5) Following a left paren, arguments to constructors can be
confused with prototype type-names.
6) Recursively in any of the above, following a left paren
(what follows might be argument expressions, or might
be function prototype parameter typing)
Examples of simple versions of the sizeof context are:
class T;
sizeof ( T ); // sizeof (type-name)
sizeof ( T[5] ); // again a type name
sizeof ( T(5) ); // sizeof (expression)
sizeof ( T() ); // semantic error: sizeof "function returning T"?
// OR ELSE sizeof result of function like cast
Examples of the old style cast ambiguity context, which are still
ambiguous when the '(' after the 'T' has been seen:
class T { /* put required declarations here */
};
a = (T( 5)); // function like cast of 5
b = (T( )) 0; // semantic error: cast of 0 to type "function
// returning T"
In constructors the following demonstrates the problems:
class T;
T (b)(int d ); // same as "T b(int);", a function declaration
T (d)(int (5)); // same as "T d(5);", an identifier declaration
T (d)(int ( )); // ambiguous
The problem can appear recursively in the following examples. By
"recursively" I mean that an ambiguity in the left-context has made
the parser unsure of whether an "expression" or a "type" is being
parsed, and the ambiguity is continued by the token sequence. After
the parser can determine what this subsequence is, it will in turn be
able to disambiguating what the prior tokens were.
Recursion on the statement/declaration context:
class S;
class T;
S (*b)(T); // declare b "pointer to function taking T returning S"
S (*c)(T dummy); // same declaration as for "b"
int dummy;
S (*d)(T (dummy)); // This T might be casting dummy
Recursion on the sizeof context is shown in the following examples. As
before, the examples include semantic errors.
class T;
class S;
sizeof ( T(S dummy) ); // sizeof "function taking S returning T"
int dummy;
sizeof ( T(S (dummy)) ); // sizeof "function taking S returning T"
// OR ELSE cast dummy to S, and then cast that to T, which
// is the same as "sizeof T;"
The following are the list of conflicts that fall into the categories
listed above. To see the complete details of each conflict state see
the y.output file supplied with this grammar.
3 RR caused by function-like cast vs typedef redeclaration ambiguity
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -